logstash-plugins / logstash-filter-translate

Translate filter for Logstash
Apache License 2.0
21 stars 47 forks source link

Refactor: leverage scheduler mixin #93

Closed kares closed 2 years ago

kares commented 2 years ago

part of https://github.com/elastic/logstash/issues/13773

see https://github.com/logstash-plugins/logstash-mixin-scheduler/pull/1

nicpenning commented 1 year ago

Greetings! Any chance this commit can cause a large YML file from not being parsed?

I can't get past Logstash 8.3.2 because of the translate filter issue:

[2023-04-05T11:06:21,961][ERROR][logstash.javapipeline    ][timberlorry-filebeat-dhcp] Pipeline error {:pipeline_id=>"timberlorry-filebeat-dhcp", :exception=>#<LogStash::Filters::Dictionary::DictionaryFileError: Translate: The incoming YAML document exceeds the limit: 3145728 code points. when loading dictionary file at C:\ProgramData\elastic\universalConfig\dhcp_enrich.yml>, :backtrace=>["org.yaml.snakeyaml.scanner.ScannerImpl.fetchMoreTokens(ScannerImpl.java:342)", "org.yaml.snakeyaml.scanner.ScannerImpl.checkToken(ScannerImpl.java:263)", "org.yaml.snakeyaml.parser.ParserImpl$ParseBlockMappingKey.produce(ParserImpl.java:662)", "org.yaml.snakeyaml.parser.ParserImpl.peekEvent(ParserImpl.java:185)", "org.yaml.snakeyaml.parser.ParserImpl.getEvent(ParserImpl.java:195)", "org.jruby.ext.psych.PsychParser.parse(PsychParser.java:210)", "org.jruby.ext.psych.PsychParser$INVOKER$i$parse.call(PsychParser$INVOKER$i$parse.gen)", "org.jruby.internal.runtime.methods.JavaMethod$JavaMethodN.call(JavaMethod.java:844)", "org.jruby.ir.targets.indy.InvokeSite.invoke(InvokeSite.java:208)", "C_3a_.ProgramData.elastic.logstash.vendor.jruby.lib.ruby.stdlib.psych.RUBY$method$parse_stream$0(C:/ProgramData/elastic/logstash/vendor/jruby/lib/ruby/stdlib/psych.rb:460)", "org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:139)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:112)", "org.jruby.ir.targets.indy.InvokeSite.invoke(InvokeSite.java:208)", "C_3a_.ProgramData.elastic.logstash.vendor.bundle.jruby.$2_dot_6_dot_0.gems.logstash_minus_filter_minus_translate_minus_3_dot_4_dot_0.lib.logstash.filters.dictionary.yaml_file.RUBY$method$read_file_into_dictionary$0(C:/ProgramData/elastic/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-filter-translate-3.4.0/lib/logstash/filters/dictionary/yaml_file.rb:19)", "C_3a_.ProgramData.elastic.logstash.vendor.bundle.jruby.$2_dot_6_dot_0.gems.logstash_minus_filter_minus_translate_minus_3_dot_4_dot_0.lib.logstash.filters.dictionary.yaml_file.RUBY$method$read_file_into_dictionary$0$__VARARGS__(C:/ProgramData/elastic/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-filter-translate-3.4.0/lib/logstash/filters/dictionary/yaml_file.rb:14)", "org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:139)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:112)", "org.jruby.ir.targets.indy.InvokeSite.invoke(InvokeSite.java:208)", "C_3a_.ProgramData.elastic.logstash.vendor.bundle.jruby.$2_dot_6_dot_0.gems.logstash_minus_filter_minus_translate_minus_3_dot_4_dot_0.lib.logstash.filters.dictionary.file.RUBY$method$merge_dictionary$0(C:/ProgramData/elastic/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-filter-translate-3.4.0/lib/logstash/filters/dictionary/file.rb:84)", "C_3a_.ProgramData.elastic.logstash.vendor.bundle.jruby.$2_dot_6_dot_0.gems.logstash_minus_filter_minus_translate_minus_3_dot_4_dot_0.lib.logstash.filters.dictionary.file.RUBY$method$merge_dictionary$0$__VARARGS__(C:/ProgramData/elastic/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-filter-translate-3.4.0/lib/logstash/filters/dictionary/file.rb:81)", "org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:139)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:112)", "org.jruby.RubyMethod.call(RubyMethod.java:132)", "org.jruby.RubyMethod$INVOKER$i$call.call(RubyMethod$INVOKER$i$call.gen)", "org.jruby.ir.targets.indy.InvokeSite.invoke(InvokeSite.java:208)", "C_3a_.ProgramData.elastic.logstash.vendor.bundle.jruby.$2_dot_6_dot_0.gems.logstash_minus_filter_minus_translate_minus_3_dot_4_dot_0.lib.logstash.filters.dictionary.file.RUBY$method$load_dictionary$0(C:/ProgramData/elastic/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-filter-translate-3.4.0/lib/logstash/filters/dictionary/file.rb:56)", "org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:139)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:112)", "org.jruby.ir.targets.indy.InvokeSite.invoke(InvokeSite.java:208)", "C_3a_.ProgramData.elastic.logstash.vendor.bundle.jruby.$2_dot_6_dot_0.gems.logstash_minus_filter_minus_translate_minus_3_dot_4_dot_0.lib.logstash.filters.dictionary.file.RUBY$method$initialize$0(C:/ProgramData/elastic/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-filter-translate-3.4.0/lib/logstash/filters/dictionary/file.rb:50)", "C_3a_.ProgramData.elastic.logstash.vendor.bundle.jruby.$2_dot_6_dot_0.gems.logstash_minus_filter_minus_translate_minus_3_dot_4_dot_0.lib.logstash.filters.dictionary.file.RUBY$method$initialize$0$__VARARGS__(C:/ProgramData/elastic/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-filter-translate-3.4.0/lib/logstash/filters/dictionary/file.rb:34)", "org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:139)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:112)", "org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:329)", "org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:87)", "org.jruby.RubyClass.newInstance(RubyClass.java:911)", "org.jruby.RubyClass$INVOKER$i$newInstance.call(RubyClass$INVOKER$i$newInstance.gen)", "org.jruby.ir.targets.indy.InvokeSite.invoke(InvokeSite.java:208)", "C_3a_.ProgramData.elastic.logstash.vendor.bundle.jruby.$2_dot_6_dot_0.gems.logstash_minus_filter_minus_translate_minus_3_dot_4_dot_0.lib.logstash.filters.dictionary.file.RUBY$method$create$0(C:/ProgramData/elastic/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-filter-translate-3.4.0/lib/logstash/filters/dictionary/file.rb:14)", "C_3a_.ProgramData.elastic.logstash.vendor.bundle.jruby.$2_dot_6_dot_0.gems.logstash_minus_filter_minus_translate_minus_3_dot_4_dot_0.lib.logstash.filters.dictionary.file.RUBY$method$create$0$__VARARGS__(C:/ProgramData/elastic/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-filter-translate-3.4.0/lib/logstash/filters/dictionary/file.rb:12)", "org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:139)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:112)", "org.jruby.ir.targets.indy.InvokeSite.invoke(InvokeSite.java:208)", "C_3a_.ProgramData.elastic.logstash.vendor.bundle.jruby.$2_dot_6_dot_0.gems.logstash_minus_filter_minus_translate_minus_3_dot_4_dot_0.lib.logstash.filters.translate.RUBY$method$register$0(C:/ProgramData/elastic/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-filter-translate-3.4.0/lib/logstash/filters/translate.rb:184)", "org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:152)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:148)", "org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:210)", "org.jruby.RubyClass.finvoke(RubyClass.java:572)", "org.jruby.runtime.Helpers.invoke(Helpers.java:649)", "org.jruby.RubyBasicObject.callMethod(RubyBasicObject.java:348)", "org.logstash.config.ir.compiler.FilterDelegatorExt.doRegister(FilterDelegatorExt.java:89)", "org.logstash.config.ir.compiler.AbstractFilterDelegatorExt.register(AbstractFilterDelegatorExt.java:75)", "org.logstash.config.ir.compiler.AbstractFilterDelegatorExt$INVOKER$i$0$0$register.call(AbstractFilterDelegatorExt$INVOKER$i$0$0$register.gen)", "org.jruby.internal.runtime.methods.JavaMethod$JavaMethodN.call(JavaMethod.java:840)", "org.jruby.ir.targets.indy.InvokeSite.fail(InvokeSite.java:237)", "org.jruby.ir.targets.indy.InvokeSite.fail(InvokeSite.java:255)", "C_3a_.ProgramData.elastic.logstash.logstash_minus_core.lib.logstash.java_pipeline.RUBY$block$register_plugins$1(C:/ProgramData/elastic/logstash/logstash-core/lib/logstash/java_pipeline.rb:237)", "org.jruby.runtime.CompiledIRBlockBody.yieldDirect(CompiledIRBlockBody.java:151)", "org.jruby.runtime.BlockBody.yield(BlockBody.java:106)", "org.jruby.runtime.Block.yield(Block.java:188)", "org.jruby.RubyArray.each(RubyArray.java:1865)", "C_3a_.ProgramData.elastic.logstash.logstash_minus_core.lib.logstash.java_pipeline.RUBY$method$register_plugins$0(C:/ProgramData/elastic/logstash/logstash-core/lib/logstash/java_pipeline.rb:236)", "org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:165)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:185)", "org.jruby.ir.targets.indy.InvokeSite.fail(InvokeSite.java:278)", "C_3a_.ProgramData.elastic.logstash.logstash_minus_core.lib.logstash.java_pipeline.RUBY$method$maybe_setup_out_plugins$0(C:/ProgramData/elastic/logstash/logstash-core/lib/logstash/java_pipeline.rb:608)", "C_3a_.ProgramData.elastic.logstash.logstash_minus_core.lib.logstash.java_pipeline.RUBY$method$maybe_setup_out_plugins$0$__VARARGS__(C:/ProgramData/elastic/logstash/logstash-core/lib/logstash/java_pipeline.rb:605)", "org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:139)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:112)", "org.jruby.ir.targets.indy.InvokeSite.fail(InvokeSite.java:248)", "org.jruby.ir.targets.indy.InvokeSite.fail(InvokeSite.java:255)", "C_3a_.ProgramData.elastic.logstash.logstash_minus_core.lib.logstash.java_pipeline.RUBY$method$start_workers$0(C:/ProgramData/elastic/logstash/logstash-core/lib/logstash/java_pipeline.rb:249)", "C_3a_.ProgramData.elastic.logstash.logstash_minus_core.lib.logstash.java_pipeline.RUBY$method$start_workers$0$__VARARGS__(C:/ProgramData/elastic/logstash/logstash-core/lib/logstash/java_pipeline.rb:245)", "org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:139)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:112)", "org.jruby.ir.targets.indy.InvokeSite.fail(InvokeSite.java:248)", "org.jruby.ir.targets.indy.InvokeSite.fail(InvokeSite.java:255)", "C_3a_.ProgramData.elastic.logstash.logstash_minus_core.lib.logstash.java_pipeline.RUBY$method$run$0(C:/ProgramData/elastic/logstash/logstash-core/lib/logstash/java_pipeline.rb:194)", "C_3a_.ProgramData.elastic.logstash.logstash_minus_core.lib.logstash.java_pipeline.RUBY$method$run$0$__VARARGS__(C:/ProgramData/elastic/logstash/logstash-core/lib/logstash/java_pipeline.rb:189)", "org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:139)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:112)", "org.jruby.ir.targets.indy.InvokeSite.fail(InvokeSite.java:248)", "org.jruby.ir.targets.indy.InvokeSite.fail(InvokeSite.java:255)", "C_3a_.ProgramData.elastic.logstash.logstash_minus_core.lib.logstash.java_pipeline.RUBY$block$start$1(C:/ProgramData/elastic/logstash/logstash-core/lib/logstash/java_pipeline.rb:146)", "org.jruby.runtime.CompiledIRBlockBody.callDirect(CompiledIRBlockBody.java:141)", "org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:64)", "org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:58)", "org.jruby.runtime.Block.call(Block.java:143)", "org.jruby.RubyProc.call(RubyProc.java:309)", "org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:107)", "java.base/java.lang.Thread.run(Thread.java:833)"], "pipeline.sources"=>["central pipeline management"], :thread=>"#<Thread:0x6169e927@C:/ProgramData/elastic/logstash/logstash-core/lib/logstash/pipeline_action/create.rb:51 run>"}

Our YAML file has 23559 lines and is 3,831,983 characters long.

I am unsure what a code point is, but I am unsure what else to do at this point.

andsel commented 1 year ago

Hi @nicpenning this change regards only the usage of a utility gem that exposes the same scheduler functionality across all the Logstash plugins. So I don't think it relates directly to your issue. Please open a bug report with a reproducer of your use case, specifying your pipeline config at least of the interested part; and if you are using a yaml as dictionary that is functional to manifest the problem, please attach it.

Please do not comment on closed/merged PRs, prefer to open a new issue, and tag me in that please.

mashhurs commented 1 year ago

@nicpenning , it sounds your snakeyaml dependency somehow got updated to 1.32+ version. Logstash v8.3.2 uses 1.29 which should not limit 3M. Snakeyaml introduced this limit from 1.32 (reference: https://bitbucket.org/snakeyaml/snakeyaml/wiki/Changes) See my recent change to logstash-filter-useragent to figure out more on what is happening.

nicpenning commented 1 year ago

I did find the correct issue and posted there. I can move away from the discussion, thank you!

nicpenning commented 1 year ago

@mashhurs found my issue here: https://github.com/logstash-plugins/logstash-filter-translate/issues/96