logstash-plugins / logstash-filter-dns

Apache License 2.0
7 stars 28 forks source link

Many ResolvTimeouts cause logstash crash #30

Open jsvd opened 8 years ago

jsvd commented 8 years ago

Created by @Danko90, moved from https://github.com/elastic/logstash/issues/6115


Hi, I tried to use the dns filter (dns.rb) to cache the requests in a dns local server. I noticed that sometimes if you try to resolve several domain names using even the dig tool, you will get SERVfailed error (the dns server doesn’t respond). This error may be caused by a bad dns server configuration, and therefore, logstash waits for an answer until the timeout. Meanwhile logstash is waiting for that answer, its queue will become full due to the other requests, that causes a crash whether logstash tries to resolve many requests of which it won’t get any response. For istance, if logstash has just one request that causes a ResolvTimeout and N other requests, logstash works well. If logstash has N requests that causes ResolvTimeout error and other N requests, logstash crashes. To fix this problem I modified the file dns.rb as you can see below

 ##set this flag to "true" for save dns requests timeout
  config :enable_cache_timeout, :validate => :boolean, :default => false

      rescue Resolv::ResolvError
        @failed_cache[raw] = true if @failed_cache
        @logger.debug("DNS: couldn't resolve the hostname.",
                      :field => field, :value => raw)
        return
      rescue Resolv::ResolvTimeout, Timeout::Error
        if enable_cache_timeout == true
                        @failed_cache[raw] = true if @failed_cache
        end
        @logger.error("DNS: timeout on resolving the hostname.",
                      :field => field, :value => raw)
        return
      rescue SocketError => e
        @logger.error("DNS: Encountered SocketError.",
                      :field => field, :value => raw, :message => e.message)
        return
      end

At line 66 I added a variable to enable cache in case of timeout. At line 128 now I’m able to cache the failed dns requests if the cache timeout is enabled.

I don’t know if this change is logically correct, but it serves me in order to avoid others logstash crashes. I would like to have an opinion from a logstash expert, any help will be appreciated.

Thanks, Danilo

pemontto commented 7 years ago

I'm definitely running into this issue. Would be great to see this handled