logstash-plugins / logstash-input-gelf

Apache License 2.0
20 stars 39 forks source link

Negative exponent panic #70

Closed wavded closed 2 years ago

wavded commented 2 years ago

Logstash information:

Please include the following information:

  1. Logstash version (e.g. bin/logstash --version): 8.1.1
  2. Logstash installation source: DEB/Ubuntu
  3. How is Logstash being run: systemd

JVM (e.g. java -version): Bundled JVM

OS version (uname -a if on a Unix-like system): Ubuntu 20.04 LTS

Description of the problem including expected versus actual behavior:

Every once in a while the GELF input will panic and the service will restart. I am not really sure how to debug the issue but I provided the logging I do receive. It seems like something is wrong with the timestamp from what I can tell but I'm not sure how to view the problematic message in order to debug. Ultimately it would be helpful if the plugin could recover from this error.

Steps to reproduce:

Would be happy to provide steps to produce if someone could point me in the direction of how to get more helpful logs as to the message that caused the error.

Provide logs (if relevant):

[2022-03-23T13:42:40,039][ERROR][logstash.javapipeline    ][main][f20f560a3fd1047d8bdcd251052a0d5b27a421cd304a6c3d8ccb28aad285d926] A plugin had an unrecoverable error. Will restart this plugin.
  Pipeline_id:main
  Plugin: <LogStash::Inputs::Gelf type=>"gelf", port=>12201, id=>"f20f560a3fd1047d8bdcd251052a0d5b27a421cd304a6c3d8ccb28aad285d926", enable_metric=>true, codec=><LogStash::Codecs::Plain id=>"plain_ec56fad7-dfdc-447d-813c-3c42bbb01c52", enable_metric=>true, charset=>"UTF-8">, host=>"0.0.0.0", remap=>true, strip_leading_underscore=>true, use_tcp=>false, use_udp=>true>
  Error: Negative exponent
  Exception: Java::JavaLang::ArithmeticException
  Stack: java.base/java.math.BigInteger.pow(BigInteger.java:2401)
org.jruby.ext.bigdecimal.RubyBigDecimal.to_r(RubyBigDecimal.java:1906)
org.jruby.ext.bigdecimal.RubyBigDecimal$INVOKER$i$0$0$to_r.call(RubyBigDecimal$INVOKER$i$0$0$to_r.gen)
org.jruby.RubyClass.checkFuncallDefault(RubyClass.java:659)
org.jruby.RubyClass.finvokeChecked(RubyClass.java:603)
org.jruby.RubyTime.numExact(RubyTime.java:261)
org.jruby.RubyTime.at(RubyTime.java:1260)
org.logstash.ext.JrubyTimestampExtLibrary$RubyTimestamp.ruby_at(JrubyTimestampExtLibrary.java:216)
usr.share.logstash.vendor.bundle.jruby.$2_dot_5_dot_0.gems.logstash_minus_input_minus_gelf_minus_3_dot_3_dot_1.lib.logstash.inputs.gelf.RUBY$method$coerce_timestamp$0(/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-gelf-3.3.1/lib/logstash/inputs/gelf.rb:246)
usr.share.logstash.vendor.bundle.jruby.$2_dot_5_dot_0.gems.logstash_minus_input_minus_gelf_minus_3_dot_3_dot_1.lib.logstash.inputs.gelf.RUBY$method$new_event$0(/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-gelf-3.3.1/lib/logstash/inputs/gelf.rb:232)
usr.share.logstash.vendor.bundle.jruby.$2_dot_5_dot_0.gems.logstash_minus_input_minus_gelf_minus_3_dot_3_dot_1.lib.logstash.inputs.gelf.RUBY$method$udp_listener$0(/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-gelf-3.3.1/lib/logstash/inputs/gelf.rb:206)
usr.share.logstash.vendor.bundle.jruby.$2_dot_5_dot_0.gems.logstash_minus_input_minus_gelf_minus_3_dot_3_dot_1.lib.logstash.inputs.gelf.RUBY$method$udp_listener$0$__VARARGS__(/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-gelf-3.3.1/lib/logstash/inputs/gelf.rb:180)
org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:80)
org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:70)
org.jruby.ir.targets.InvokeSite.invoke(InvokeSite.java:207)
usr.share.logstash.vendor.bundle.jruby.$2_dot_5_dot_0.gems.logstash_minus_input_minus_gelf_minus_3_dot_3_dot_1.lib.logstash.inputs.gelf.RUBY$block$run$2(/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-gelf-3.3.1/lib/logstash/inputs/gelf.rb:79)
org.jruby.runtime.CompiledIRBlockBody.callDirect(CompiledIRBlockBody.java:138)
org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:58)
org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:52)
org.jruby.runtime.Block.call(Block.java:139)
org.jruby.RubyProc.call(RubyProc.java:318)
org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:105)
java.base/java.lang.Thread.run(Thread.java:829)
[2022-03-23T13:42:40,047][FATAL][org.logstash.Logstash    ] Logstash stopped processing because of an error: (SystemExit) exit
org.jruby.exceptions.SystemExit: (SystemExit) exit
        at org.jruby.RubyKernel.exit(org/jruby/RubyKernel.java:747) ~[jruby.jar:?]
        at org.jruby.RubyKernel.exit(org/jruby/RubyKernel.java:710) ~[jruby.jar:?]
        at usr.share.logstash.lib.bootstrap.environment.<main>(/usr/share/logstash/lib/bootstrap/environment.rb:94) ~[?:?]
logicsys commented 2 years ago

I am experiencing the same issue.

This appears to be related to the change brought in by f9de73c473839b6f2b2f9802d8bac0435d50e6ee

when i replace lines 244-246 of /lib/logstash/inputs/gelf.rb with the previous commits content for function self.coerce_timestamp

latest commit:

  def self.coerce_timestamp(timestamp)
    # prevent artificial precision from being injected by floats
    timestamp = timestamp.rationalize if timestamp.kind_of?(Float)
    LogStash::Timestamp.at(timestamp)
  end

previous commit:

  def self.coerce_timestamp(timestamp)
    # bug in JRuby prevents correcly parsing a BigDecimal fractional part, see https://github.com/elastic/logstash/issues/4565
    timestamp.is_a?(BigDecimal) ? LogStash::Timestamp.at(timestamp.to_i, timestamp.frac * 1000000) : LogStash::Timestamp.at(timestamp)
  end

i am no longer seeing Java::JavaLang::ArithmeticException errors, and logs are passed as expected to my outputs.

I am using logstash 7.17.1

jugsofbeer commented 2 years ago

Im suffering from the same issue. Whats been posted above matchs my scenario .

It occurs on Java 11 and 14 for us, so not limited to just the baked in jdk

shixinyu commented 2 years ago

I am experiencing the same issue.

This appears to be related to the change brought in by f9de73c

when i replace lines 244-246 of /lib/logstash/inputs/gelf.rb with the previous commits content for function self.coerce_timestamp

latest commit:

  def self.coerce_timestamp(timestamp)
    # prevent artificial precision from being injected by floats
    timestamp = timestamp.rationalize if timestamp.kind_of?(Float)
    LogStash::Timestamp.at(timestamp)
  end

previous commit:

  def self.coerce_timestamp(timestamp)
    # bug in JRuby prevents correcly parsing a BigDecimal fractional part, see https://github.com/elastic/logstash/issues/4565
    timestamp.is_a?(BigDecimal) ? LogStash::Timestamp.at(timestamp.to_i, timestamp.frac * 1000000) : LogStash::Timestamp.at(timestamp)
  end

i am no longer seeing Java::JavaLang::ArithmeticException errors, and logs are passed as expected to my outputs.

I am using logstash 7.17.1

This solution is working for me, Logstash 8.1.1

jugsofbeer commented 2 years ago

Hi, Can someone confirm if/when this fix will make it into the v7.17.x range if it is already fixed in v8.1.1 ?

selfisch commented 2 years ago

Hi, Can someone confirm if/when this fix will make it into the v7.17.x range if it is already fixed in v8.1.1 ?

Without having tested 8.x yet, I think it is not fixed in any version after 7.17.x. The working solution currently depends on a manual fix, by changing the above mentioned lines.

mfilocha commented 2 years ago

I can confirm that v7.17 with gelf-input plugin 3.3.1 is affected.

Reverting to v.7.16.1 with gelf plugin 3.3.0 makes logstash working again.

geekpete commented 2 years ago

If anyone can provide a tcpdump to show an example input into this plugin that can then be used to reproduce this crash, that will greatly assist a fix for it and test any fix then works against the same input to avoid the crash.

geekpete commented 2 years ago

After packet sniffing to capture raw shipped gelf events known to crash the new version of the plugin, it was observed that the raw event was being shipped with the timestamp sent in as an exponent numeric value.

eg: 1.660873462488E9

Testing some of these sample timestamps captured before and around the time of the crash didn't seem to help reproduce the crash, but testing with larger and larger exponents eventually saw the crash reproduced.

For large enough exponents the 3.3.0 plugin version was also able to be crashed but with a slightly different error:

An unexpected error occurred! {:error=>#<RangeError: bignum too big to convert into `long'>, :backtrace=>["org/jruby/RubyTime.java:1331:in `at'", "org/jruby/RubyTime.java

rather than the 3.3.1 error:

java.lang.ArithmeticException: Negative exponent at java.base/java.math.BigInteger.pow(BigInteger.java

Reproducing the crash

Using a basic pipeline config to open gelf input and output events to a file:

input {
    gelf {
        port => 44384
        use_udp => true
        remap => false
        strip_leading_underscore => true
        tags => ["test-gelf"]
    }
}

output {
    file {
        path => "test.log"
        codec => "json_lines"
    }
}

then testing Logstash 8.3.3 with both logstash-input-gelf version 3.3.0 and 3.3.1.

./bin/logstash-plugin install --version 3.3.0 logstash-input-gelf or ./bin/logstash-plugin install --version 3.3.1 logstash-input-gelf

A script to send a test event at the gelf input:

read -r -d '' gelf_message <<EOF
{
  "version": "1.0",
  "timestamp": 1.660873462488E13
}
EOF

#1.660873462488E12 - no crash but very far future year/date
#1.660873462488E13 - negative exponent error in 3.3.1 but not in 3.3.0
#16608734624880 - alternative representation of the E13 exponent value which avoids the crash in both plugin versions

echo Plain
echo -----
echo -n "${gelf_message}"
echo ==============
echo
echo

echo Send
echo ----
echo  "${gelf_message}"| gzip -c -f - | nc -w 1 -u localhost 44384 
echo ==============
echo
echo

A timestamp value of 1.660873462488E13 crashes the 3.3.1 version of the plugin, but not the 3.3.0 version using the previous code commit which is able to ingest the timestamp without issue:

{"version":"1.0","tags":["test-gelf"],"@timestamp":"+528279-11-06T19:48:00Z","@version":"1","source_host":"127.0.0.1"}

Also interesting to note is the equivalent non-exponent version of the same value 16608734624880 doesn't reproduce the crash in either version of the plugin and produces the same event above.

Do we just need more guards around the input value for timestamp to avoid this crash and discard+log the event with the unparsable timestamp value? Or expose these as options in the plugin to allow a user to decide what to do with values like this and allow them through somehow but tagged as unparsable for processing later?

Or is there a safer way to parse these types of exponent values into the non-exponent values that seem to be easily consumed?

bataya0 commented 2 years ago

I was able to reproduce the issue running a v8.2.0 logstash using the example GELF input string { "version": "1.0", "timestamp": 1.660873462488E13 } and logstash crashed.

I then manually copied the two lines from the above fix/commit and applied to my test machine (manually updated the gelf.rb file line 246 ) and started logstash v8.2.0 and sent in the string { "version": "1.0", "timestamp": 1.660873462488E13 } and logstash DID NOT crash and ran as expected and data arrived as expected.

I then repeated this on the current latest logstash available version ( 8.3.3 )

I was able to reproduce the issue running a v8.3.3 logstash using the example GELF input string { "version": "1.0", "timestamp": 1.660873462488E13 } and logstash crashed.

I then manually copied the two lines from the above fix/commit and applied to my test machine (manually updated the gelf.rb file line 246 ) and started logstash v8.3.3 and sent in the string { "version": "1.0", "timestamp": 1.660873462488E13 } and logstash DID NOT crash and ran as expected and data arrived as expected.