Closed colinsurprenant closed 5 years ago
Impressive digging. I'm +1 on the fix itself.
It appears that the source of the bug is https://github.com/jruby/jruby/commit/d1a760e066343e7a4956bbdd7be7975b1eda04cd in JRuby 9.1.8.0, a divergence from the stdlib's implementation that adds support for compressed IPv6.
If this is the case, I think we should arrange for a fix to be applied upstream on JRuby, and should limit our monkey-patching to only affected versions of JRuby; an open-ended monkey-patch may bite us later when the underlying library changes its implementation details.
@yaauie thanks. Yes, I am planning on submitting issues/PR upstream in JRuby and Ruby.
For the monkey-patch strategy, since we don't know when it will be fixed upstream we have 2 choices:
resolv.rb
and eventually break. An alternative would be to raise and exception in this patch to check if the JRuby version has changed to force the validation of this patch. This will be annoying until the fix is included upstream but should be safe to avoid both potential problems above. WDYT?
upstream in JRuby and Ruby
The bug is in a JRuby divergence from the Ruby stdlib implementation, so no issue/patch for Ruby needed.
I trust your judgement to mitigate the risk of applying the patch to an incompatible version.
Your suggestion to error seems a bit heavy-handed and I fear a gap in CI coverage could lead to a release of Logstash that errors when using a configuration including this plugin. If you don't fear such a gap, I'll trust your judgement.
Alternatively:
ENV['RESOLV_PATCH_CACHE_LEAK']
(enabling a path to opt-out)@yaauie
This PR already only applies the patch if JRuby >= 9.2.0.0 and the previous patch is still there for when JRuby is >= 9.1.13.0 && < 9.1.16.0. Not sure about your «Skip patch if JRuby < 9.1.18.0» ? (BTW there is no JRuby 9.1.18.0, AFAICS versions tags are 9.1.17.0 and the next version is 9.2.0.0).
I a not sure I like the idea of the env var opt out, it adds some complexity mostly in communicating this "feature".
Maybe another option could be to, instead of applying a monkey patch, to include a complete patched resolv.rb
file which would protect us from a potential future incompatible monkey-patch. On top of this we could add a spec in logstash-core that will fail on a JRuby update in the code to warn to verify for the inclusion of the fix in resolv.rb upstream which will allow us to know the exact version when the fix will be in and come back here in the dns filter to update the JRuby version range.
@yaauie Per my last comment I added the full & patched resolv.rb
which should minimize potential mokey patch conflicts with future JRuby upgrades since we can't put a version upper bound for applying this patch. If we agree on this solution, I will also followup with a spec in logstash-core per my above comment.
LMKWYT
Either solution works for me. I don't care to hold you up by nitpicking.
LGTM.
thanks @yaauie - merging and publishing.
Fixes #51
What
resolv.rb
was updated from upstream Ruby stdlib starting at JRuby 9.2.0.0 which essentially re-introduced issue #40 where all request will eventually systematically time out.This problem will only be triggered if either the
nameserver
option has more than one host OR if there is nonameserver
option but there are more than one host in/etc/resolv.conf
resulting in the usage of theUnconnectedUDP
class inresolv.rb
.Why
Essentially the problem is that the
DNS.allocate_request_id(host, port)
call will cache the allocated id using thehost
andport
parameters.But the cache cleanup will be done in the
close
method by the DNS.free_request_id(service[0], service[1], id) call where theservice[0]
will beIPAddr.new(host)
and not the originalhost
string used in thesender
method leading to not finding the original cache key and not freeing it from the cache which is 64k elements wide and once full, theallocate_request_id
method will loop forever.Fix
This fix just makes sure that the exact same parameters are used in both
DNS.allocate_request_id
andDNS.free_request_id
using theIPAddr#to_s
which will always yield the same host string.Tests
ipgenerator
input plugins which generates unique ip addresses and the dns filter with two hosts in thenameserver
option and confirm bug starting at 64k ip addresses generated. Then re-tested with patch applied.Workaround
A temporaty workaround is to either use the
nameserver
option with a single host entry OR to no use thenameserver
option and have a single entry in/etc/resolv.conf
.