logstash-plugins / logstash-input-file

Apache License 2.0
68 stars 101 forks source link

Fix: sincedb_write issue on Windows machines #283

Closed fishie closed 3 years ago

fishie commented 3 years ago

On our Windows servers we for some reason, occasionally, get exceptions with classes IOError and Java::JavaLang:RuntimeException when writing to sincedb file which crashes the plugin and somehow messes with the sincedb file which results in old log files being reprocessed causing duplicates. If we rescue the errors then everything is fine which is the reason for this change.

This is an example of the IOError we get:

[2021-03-10T20:34:56,562][ERROR][logstash.javapipeline    ][main][application-customer-input] A plugin had an unrecoverable error. Will restart this plugin.
  Pipeline_id:main
  Plugin: <LogStash::Inputs::File start_position=>"beginning", path=>["D:/Logs/customer/*.txt"], codec=><LogStash::Codecs::Multiline pattern=>"^%{TIMESTAMP_ISO8601:timestamp}", what=>"previous", id=>"7933efd1-d366-43f1-9669-c388b304d745", negate=>true, enable_metric=>true, charset=>"UTF-8", multiline_tag=>"multiline", max_lines=>500, max_bytes=>10485760>, sincedb_clean_after=>1209600.0, ignore_older=>1123200.0, id=>"application-customer-input", type=>"application-customer", enable_metric=>true, stat_interval=>1.0, discover_interval=>15, sincedb_write_interval=>15.0, delimiter=>"\n", close_older=>3600.0, mode=>"tail", file_completed_action=>"delete", file_chunk_size=>32768, file_chunk_count=>140737488355327, file_sort_by=>"last_modified", file_sort_direction=>"asc", exit_after_read=>false, check_archive_validity=>false>
  Error: The handle is invalid
  Exception: IOError
  Stack: org/jruby/RubyIO.java:1234:in `sysopen'
org/jruby/RubyIO.java:1214:in `sysopen'
C:/logstash-7.11.1/vendor/bundle/jruby/2.5.0/gems/logstash-input-file-4.2.3/lib/filewatch/sincedb_collection.rb:235:in `non_atomic_write'
org/jruby/RubyMethod.java:119:in `call'
C:/logstash-7.11.1/vendor/bundle/jruby/2.5.0/gems/logstash-input-file-4.2.3/lib/filewatch/sincedb_collection.rb:212:in `sincedb_write'
C:/logstash-7.11.1/vendor/bundle/jruby/2.5.0/gems/logstash-input-file-4.2.3/lib/filewatch/sincedb_collection.rb:185:in `flush_at_interval'
C:/logstash-7.11.1/vendor/bundle/jruby/2.5.0/gems/logstash-input-file-4.2.3/lib/filewatch/watch.rb:57:in `subscribe'
C:/logstash-7.11.1/vendor/bundle/jruby/2.5.0/gems/logstash-input-file-4.2.3/lib/filewatch/observing_tail.rb:12:in `subscribe'
C:/logstash-7.11.1/vendor/bundle/jruby/2.5.0/gems/logstash-input-file-4.2.3/lib/logstash/inputs/file.rb:364:in `run'
C:/logstash-7.11.1/logstash-core/lib/logstash/java_pipeline.rb:405:in `inputworker'
C:/logstash-7.11.1/logstash-core/lib/logstash/java_pipeline.rb:396:in `block in start_input'
[2021-03-10T20:34:57,683][INFO ][filewatch.observingtail  ][main][application-customer-input] QUIT - closing all files and shutting down.

This is an example of the Java::JavaLang:RuntimeException we get:

[2021-03-12T02:06:43,247][ERROR][logstash.javapipeline    ][main][application-product-api-input] A plugin had an unrecoverable error. Will restart this plugin.
  Pipeline_id:main
  Plugin: <LogStash::Inputs::File start_position=>"beginning", path=>["D:/Logs/productapi/*.txt"], codec=><LogStash::Codecs::Multiline pattern=>"^%{TIMESTAMP_ISO8601:timestamp}", what=>"previous", id=>"91a3327c-f546-4b06-a0a6-63cd41dd4ac9", negate=>true, enable_metric=>true, charset=>"UTF-8", multiline_tag=>"multiline", max_lines=>500, max_bytes=>10485760>, sincedb_clean_after=>1209600.0, ignore_older=>1123200.0, id=>"application-product-api-input", type=>"application-product-api", enable_metric=>true, stat_interval=>1.0, discover_interval=>15, sincedb_write_interval=>15.0, delimiter=>"\n", close_older=>3600.0, mode=>"tail", file_completed_action=>"delete", file_chunk_size=>32768, file_chunk_count=>140737488355327, file_sort_by=>"last_modified", file_sort_direction=>"asc", exit_after_read=>false, check_archive_validity=>false>
  Error: unknown IOException: java.io.IOException: The handle is invalid
  Exception: Java::JavaLang::RuntimeException
  Stack: org.jruby.util.io.PosixShim.close(PosixShim.java:259)
org.jruby.util.io.PosixShim.close(PosixShim.java:247)
org.jruby.util.io.OpenFile.finalizeFlush(OpenFile.java:899)
org.jruby.RubyIO.rbIoClose(RubyIO.java:2044)
org.jruby.RubyIO.close(RubyIO.java:1990)
org.jruby.RubyIO$INVOKER$i$0$0$close.call(RubyIO$INVOKER$i$0$0$close.gen)
org.jruby.RubyClass.checkFuncallDefault(RubyClass.java:659)
org.jruby.RubyClass.finvokeChecked(RubyClass.java:603)
org.jruby.runtime.Helpers.invokeChecked(Helpers.java:501)
org.jruby.RubyBasicObject.checkCallMethod(RubyBasicObject.java:346)
org.jruby.RubyIO.ioClose(RubyIO.java:2003)
org.jruby.RubyIO.ensureYieldClose(RubyIO.java:1166)
org.jruby.RubyIO.open(RubyIO.java:1158)
C_3a_.logstash_minus_7_dot_11_dot_1.vendor.bundle.jruby.$2_dot_5_dot_0.gems.logstash_minus_input_minus_file_minus_4_dot_2_dot_3.lib.filewatch.sincedb_collection.RUBY$method$non_atomic_write$0(C:/logstash-7.11.1/vendor/bundle/jruby/2.5.0/gems/logstash-input-file-4.2.3/lib/filewatch/sincedb_collection.rb:235)
org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:106)
org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:140)
org.jruby.RubyMethod.call(RubyMethod.java:119)
org.jruby.RubyMethod$INVOKER$i$call.call(RubyMethod$INVOKER$i$call.gen)
org.jruby.internal.runtime.methods.JavaMethod$JavaMethodZeroOrOneOrNBlock.call(JavaMethod.java:349)
C_3a_.logstash_minus_7_dot_11_dot_1.vendor.bundle.jruby.$2_dot_5_dot_0.gems.logstash_minus_input_minus_file_minus_4_dot_2_dot_3.lib.filewatch.sincedb_collection.RUBY$method$sincedb_write$0(C:/logstash-7.11.1/vendor/bundle/jruby/2.5.0/gems/logstash-input-file-4.2.3/lib/filewatch/sincedb_collection.rb:212)
org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:80)
org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:103)
org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:140)
org.jruby.ir.targets.InvokeSite.fail(InvokeSite.java:278)
C_3a_.logstash_minus_7_dot_11_dot_1.vendor.bundle.jruby.$2_dot_5_dot_0.gems.logstash_minus_input_minus_file_minus_4_dot_2_dot_3.lib.filewatch.sincedb_collection.RUBY$method$flush_at_interval$0(C:/logstash-7.11.1/vendor/bundle/jruby/2.5.0/gems/logstash-input-file-4.2.3/lib/filewatch/sincedb_collection.rb:185)
C_3a_.logstash_minus_7_dot_11_dot_1.vendor.bundle.jruby.$2_dot_5_dot_0.gems.logstash_minus_input_minus_file_minus_4_dot_2_dot_3.lib.filewatch.sincedb_collection.RUBY$method$flush_at_interval$0$__VARARGS__(C:/logstash-7.11.1/vendor/bundle/jruby/2.5.0/gems/logstash-input-file-4.2.3/lib/filewatch/sincedb_collection.rb)
org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:80)
org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:70)
org.jruby.ir.targets.InvokeSite.fail(InvokeSite.java:248)
org.jruby.ir.targets.InvokeSite.fail(InvokeSite.java:255)
C_3a_.logstash_minus_7_dot_11_dot_1.vendor.bundle.jruby.$2_dot_5_dot_0.gems.logstash_minus_input_minus_file_minus_4_dot_2_dot_3.lib.filewatch.watch.RUBY$method$subscribe$0(C:/logstash-7.11.1/vendor/bundle/jruby/2.5.0/gems/logstash-input-file-4.2.3/lib/filewatch/watch.rb:57)
C_3a_.logstash_minus_7_dot_11_dot_1.vendor.bundle.jruby.$2_dot_5_dot_0.gems.logstash_minus_input_minus_file_minus_4_dot_2_dot_3.lib.filewatch.watch.RUBY$method$subscribe$0$__VARARGS__(C:/logstash-7.11.1/vendor/bundle/jruby/2.5.0/gems/logstash-input-file-4.2.3/lib/filewatch/watch.rb)
org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:80)
org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:70)
org.jruby.ir.targets.InvokeSite.invoke(InvokeSite.java:207)
C_3a_.logstash_minus_7_dot_11_dot_1.vendor.bundle.jruby.$2_dot_5_dot_0.gems.logstash_minus_input_minus_file_minus_4_dot_2_dot_3.lib.filewatch.observing_tail.RUBY$method$subscribe$0(C:/logstash-7.11.1/vendor/bundle/jruby/2.5.0/gems/logstash-input-file-4.2.3/lib/filewatch/observing_tail.rb:12)
C_3a_.logstash_minus_7_dot_11_dot_1.vendor.bundle.jruby.$2_dot_5_dot_0.gems.logstash_minus_input_minus_file_minus_4_dot_2_dot_3.lib.filewatch.observing_tail.RUBY$method$subscribe$0$__VARARGS__(C:/logstash-7.11.1/vendor/bundle/jruby/2.5.0/gems/logstash-input-file-4.2.3/lib/filewatch/observing_tail.rb)
org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:80)
org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:70)
org.jruby.ir.targets.InvokeSite.invoke(InvokeSite.java:207)
C_3a_.logstash_minus_7_dot_11_dot_1.vendor.bundle.jruby.$2_dot_5_dot_0.gems.logstash_minus_input_minus_file_minus_4_dot_2_dot_3.lib.logstash.inputs.file.RUBY$method$run$0(C:/logstash-7.11.1/vendor/bundle/jruby/2.5.0/gems/logstash-input-file-4.2.3/lib/logstash/inputs/file.rb:364)
org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:106)
org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:140)
org.jruby.ir.targets.InvokeSite.fail(InvokeSite.java:278)
C_3a_.logstash_minus_7_dot_11_dot_1.logstash_minus_core.lib.logstash.java_pipeline.RUBY$method$inputworker$0(C:/logstash-7.11.1/logstash-core/lib/logstash/java_pipeline.rb:405)
C_3a_.logstash_minus_7_dot_11_dot_1.logstash_minus_core.lib.logstash.java_pipeline.RUBY$block$start_input$1(C:/logstash-7.11.1/logstash-core/lib/logstash/java_pipeline.rb:396)
org.jruby.runtime.CompiledIRBlockBody.callDirect(CompiledIRBlockBody.java:138)
org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:58)
org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:52)
org.jruby.runtime.Block.call(Block.java:139)
org.jruby.RubyProc.call(RubyProc.java:318)
org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:105)
java.base/java.lang.Thread.run(Thread.java:834)
[2021-03-12T02:06:44,304][INFO ][filewatch.observingtail  ][main][application-product-api-input] QUIT - closing all files and shutting down.
[2021-03-12T02:06:44,314][INFO ][filewatch.observingtail  ][main][application-product-api-input] START, creating Discoverer, Watch with file and sincedb collections

With the change we get:

[2021-03-15T10:25:47,278][WARN ][filewatch.sincedbcollection][main][application-pharmacy-api-input] sincedb_write: C:/logstash-7.11.1/data/plugins/inputs/file/.sincedb_334238b39b987e24f81f35882286bcf3 error: {:exception=>IOError, :message=>"The handle is invalid"}

and

[2021-03-15T16:49:05,736][WARN ][filewatch.sincedbcollection][main][application-prescription-api-input] sincedb_write: C:/logstash-7.11.1/data/plugins/inputs/file/.sincedb_d41d8cd98f00b204e9800998ecf8427e error: {:exception=>Java::JavaLang::RuntimeException, :message=>"unknown IOException: java.io.IOException: The handle is invalid"}
kares commented 3 years ago

Thanks Rishie, interesting issue - however the PR is not a good solution (ideally we want to fix the underlying issue). The description's details (esp. the logged error) is sufficient for an issue report so if you have time we could try a few experiments to resolve/understand the underlying issue.

Believe this code (only used on Windows): https://github.com/logstash-plugins/logstash-input-file/pull/183/files#diff-30ed4826858d9eb6361d311895e1024281a256c05ae31c25592820758d86e074R210 might lead to issues (and potentially exposes a bug in the underlying JRuby library). But I do not think we need the low-level sysopen here.

Could you try running with the following patch:

diff --git lib/filewatch/sincedb_collection.rb lib/filewatch/sincedb_collection.rb
index b7dc19b..62881b7 100644
--- lib/filewatch/sincedb_collection.rb
+++ lib/filewatch/sincedb_collection.rb
@@ -232,7 +232,7 @@ module FileWatch

     # @return expired keys
     def non_atomic_write(time)
-      IO.open(IO.sysopen(@full_path, "w+")) do |io|
+      IO.open(@full_path, "w+") do |io|
         @serializer.serialize(@sincedb, io, time.to_f)
       end
     end

and report back whether you've seen the issue ...

Also if you could pls clarify how often the issue surfaces and what kind of Windows + FS storage are you using.

fishie commented 3 years ago

Hi! Of course! I don't have that much time anymore because of the baby but I tried your change in our test server and it crashed with "Error: no implicit conversion of String into Integer". I guess open takes an Integer? I have never done any Ruby programming so I'm pretty lost here.

Right now I would say it happens a few times a month. But when we had a lot of inputs with empty paths (path => []) it happened everyday. We found out that was not how you're supposed to do it and have removed the inputs with empty paths since then but it still happens every now and then. One obvious problem with having several inputs with path => [] is that they all get the same sincedb filename generated which I guess causes the issue to occur more often, I have no idea. So in the test server I have artificially added a lot of empty inputs with empty paths to try and test this issue.

We have different versions of Windows Server running so I have seen it happen on Windows Server 2012 R2 and Windows Server 2016 Standard. We're slightly above my pay grade here but they're using some sort of virtualized drives with NTFS.

kares commented 3 years ago

Hey, congrats on the :baby: ! Definitely take your time, I do not have a Windows machine at hand atm but might try to reproduce later. It's just that I am not sure how safe your work-around is in terms of "abandoning" the filedb append operation.

Regarding the patch, I am sorry (got interrupted in the middle of my response) the correct one should have been:

diff --git lib/filewatch/sincedb_collection.rb lib/filewatch/sincedb_collection.rb
index b7dc19b..d053b27 100644
--- lib/filewatch/sincedb_collection.rb
+++ lib/filewatch/sincedb_collection.rb
@@ -232,7 +232,7 @@ module FileWatch

     # @return expired keys
     def non_atomic_write(time)
-      IO.open(IO.sysopen(@full_path, "w+")) do |io|
+      File.open(@full_path, "w+") do |io|
         @serializer.serialize(@sincedb, io, time.to_f)
       end
     end
fishie commented 3 years ago

Thank you! No I agree it would be better to find the root cause. Trying the new patch now in our test server, no immediate crashes. Will run it for a few days to see the effect. As a reference point it happened 3 times between midnight and noon today (12 hours) with the old code in the modified test server where it happens more often.

I guess we are kind of rare because we're running on Windows servers and we are using start_position => beginning. If we use start_position => end, which is default, then of course there's no problem when the plugin restarts. But we chose beginning because if for example Logstash is down, or just stopped, for 5 hours for some reason then we would lose 5 hours of logs, at least that was our reasoning, maybe it's incorrect.

fishie commented 3 years ago

The exceptions haven't occurred for over 24 hours! Looks very promising. I changed the pull request to your change instead and will update all our servers with the patch. I'll report back on Monday (if I don't forget) with results.

kares commented 3 years ago

Great, thanks for the update. Let's wait a bit than and we shall include the updated PR in a next patch release.

fishie commented 3 years ago

Looked through the logs and found no issues on any of our servers! The best proof is that it hasn't happened for 5 days on the test server where it used to happen several times a day with the old code. 👍

kares commented 3 years ago

That's great, thanks for confirming. I think this is challenging to test and potentially only on Windows so I am fine to ship this as is in the next patch release. Could you please update the version and add a changelog entry into CHANGELOG.md for the new version, in this PR.

mtzwn commented 3 years ago

We have experienced the same issue in our Windows Server environment since January 2021, after an upgrade of logstash-input-file plugin (bin/logstash-plugin update logstash-input-file) it looks to fix an issue. Previously pipelines crashed after few (1-2) hours of running with error message "The handle is invalid". Right now logstash is stable for over 20 hours. Thanks!

I noticed this issue first after an upgrading from LS 7.8.2 to 7.10.2