logstash-plugins / logstash-filter-translate

Translate filter for Logstash
Apache License 2.0
21 stars 47 forks source link

Regression using filter with csv file AND a CSV filter in the same config. #70

Closed tshafeev closed 6 years ago

tshafeev commented 6 years ago

logstash:5.5.2 after upgrading plugin to version 3.2.0 we have error when loading csv translation file.

[2018-08-15T11:07:39,930][ERROR][logstash.pipeline        ] Error registering plugin {:plugin=>"#<LogStash::FilterDelegator:0x11ca9d8f @id=\"7d8b4bdad4f2f9d24c6b1b426ab0fa32a37143ed-18\", @klass=LogStash::Filters::Translate, @metric_events=#<LogStash::Instrument::NamespacedMetric:0x51dd582a @metric=#<LogStash::Instrument::Metric:0x7ee8c5bc @collector=#<LogStash::Instrument::Collector:0x391a5d97 @agent=nil, @metric_store=#<LogStash::Instrument::MetricStore:0x7cf999de @store=#<Concurrent::Map:0x0000000006484c entries=2 default_proc=nil>, @structured_lookup_mutex=#<Mutex:0x47de9263>, @fast_lookup=#<Concurrent::Map:0x00000000064850 entries=131 default_proc=nil>>>>, @namespace_name=[:stats, :pipelines, :main, :plugins, :filters, :\"7d8b4bdad4f2f9d24c6b1b426ab0fa32a37143ed-18\", :events]>, @logger=#<LogStash::Logging::Logger:0x143e6697 @logger=#<Java::OrgApacheLoggingLog4jCore::Logger:0x68f17977>>, @filter=<LogStash::Filters::Translate destination=>\"SomeDestination\", dictionary_path=>\"/dictionaries/Lookup.csv\", field=>\"SomeField\", id=>\"7d8b4bdad4f2f9d24c6b1b426ab0fa32a37143ed-18\", enable_metric=>true, periodic_flush=>false, override=>false, refresh_interval=>300, exact=>true, regex=>false, refresh_behaviour=>\"merge\">>", :error=>"undefined method `[]' for #<StringIO:0x1747f9c9>"}
[2018-08-15T11:07:40,144][ERROR][logstash.agent           ] Pipeline aborted due to error {:exception=>#<NoMethodError: undefined method `[]' for #<StringIO:0x1747f9c9>>, :backtrace=>["/usr/share/logstash/logstash-core/lib/logstash/plugin.rb:60:in `initialize'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:127:in `initialize'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-translate-3.2.0/lib/logstash/filters/dictionary/csv_file.rb:11:in `initialize_for_file_type'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-translate-3.2.0/lib/logstash/filters/dictionary/file.rb:45:in `initialize'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-translate-3.2.0/lib/logstash/filters/dictionary/file.rb:19:in `create'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-translate-3.2.0/lib/logstash/filters/translate.rb:166:in `register'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:281:in `register_plugin'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:292:in `register_plugins'", "org/jruby/RubyArray.java:1613:in `each'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:292:in `register_plugins'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:302:in `start_workers'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:226:in `run'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:398:in `start_pipeline'"]}

CSV file content looks like

2877,2877 Blabla1
2888,2888 Blabla2 Blabla3
2889,2889 Blabla5 Blabla6

With 3.1.0 works as expected

guyboertje commented 6 years ago

Need more information. There are tests that use real csv file fixtures, we saw no failures.

I installed 3.2.0 on LS 5.5.3. This config operates correctly:

input {
  generator {
    lines => ['222101333', '123456789', 'abc103def', 'xyz301mno']
    count => 1
  }
}

filter {
  translate {
    field       => "[message]"
    destination => "[matched]"
    dictionary_path  => "/elastic/tmp/testing/confs/translate-drop.csv"
    exact       => true
    regex       => true
  }
  if [matched] == "drop" {
    drop {}
  }
}

output {
  stdout { codec => rubydebug {metadata => true} }
}

With results:

[2018-08-15T16:36:50,684][INFO ][logstash.pipeline        ] Starting pipeline {"id"=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>1000}
[2018-08-15T16:36:50,691][INFO ][logstash.pipeline        ] Pipeline main started
{
      "sequence" => 0,
    "@timestamp" => 2018-08-15T15:36:50.713Z,
      "@version" => "1",
          "host" => "Elastics-MacBook-Pro.local",
       "message" => "123456789"
}
{
      "sequence" => 0,
    "@timestamp" => 2018-08-15T15:36:50.715Z,
      "@version" => "1",
          "host" => "Elastics-MacBook-Pro.local",
       "message" => "xyz301mno"
}
[2018-08-15T16:36:50,749][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
[2018-08-15T16:36:53,711][WARN ][logstash.agent           ] stopping pipeline {:id=>"main"}

CSV file

$ cat /elastic/tmp/testing/confs/translate-drop.csv
100,drop
101,drop
102,drop
103,drop
qikiqi commented 6 years ago

I also stumbled upon this 'Error registering plugin' while performing upgrades to the ELK stack and to the Logstash plugins.

elasticsearch.version: 6.4.0 kibana.version: 6.4.0 logstash.version: 6.4.0

Content of the CSV is formatted along these lines

$ cat file.csv
from,dest
83,1-20

We are feeding input from http_poller to the translate filter, performing various things with other filters (ruby, elasticsearch, mutate) before outputting the events with logstash-output-elasticsearch.

Unfortunately I haven't been able to produce a minimal example that would result into an error. All the minimized configs work as expected but the 'real' config fails with the following message:

Aug 24 13:10:03 elastic-server logstash[17399]: [2018-08-24T13:10:03,699][ERROR][logstash.pipeline        ] Error registering plugin {:pipeline_id=>"main", :plugin=>"#<LogStash::FilterDelegator:0x3507c1bc @metric_events_out=org.jruby.proxy.org.logstash.instrument.metrics.counter.LongCounter$Proxy2 -  name: out value:0, @metric_events_in=org.jruby.proxy.org.logstash.instrument.metrics.counter.LongCounter$Proxy2 -  name: in value:0, @metric_events_time=org.jruby.proxy.org.logstash.instrument.metrics.counter.LongCounter$Proxy2 -  name: duration_in_millis value:0, @id=\"translate_poller_sessions_filter\", @klass=LogStash::Filters::Translate, @metric_events=#<LogStash::Instrument::NamespacedMetric:0x2bc87a0a>, @filter=<LogStash::Filters::Translate refresh_interval=>3602, field=>\"from\", dictionary_path=>\"/etc/logstash/file.csv\", destination=>\"dest\", override=>false, id=>\"translate_poller_sessions_filter\", enable_metric=>true, periodic_flush=>false, exact=>true, regex=>false, refresh_behaviour=>\"merge\">>", :error=>"undefined method `[]' for #<StringIO:0x51039638>", :thread=>"#<Thread:0x54946c93@/usr/share/logstash/logstash-core/lib/logstash/pipeline_action/create.rb:46 run>"}
Aug 24 13:10:04 elastic-server logstash[17399]: [2018-08-24T13:10:04,762][ERROR][logstash.pipeline        ] Pipeline aborted due to error {:pipeline_id=>"main", :exception=>#<NoMethodError: undefined method `[]' for #<StringIO:0x51039638>>, :backtrace=>["/usr/share/logstash/logstash-core/lib/logstash/plugin.rb:56:in `initialize'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:125:in `initialize'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-translate-3.2.0/lib/logstash/filters/dictionary/csv_file.rb:11:in `initialize_for_file_type'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-translate-3.2.0/lib/logstash/filters/dictionary/file.rb:45:in `initialize'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-translate-3.2.0/lib/logstash/filters/dictionary/file.rb:19:in `create'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-translate-3.2.0/lib/logstash/filters/translate.rb:166:in `register'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:241:in `register_plugin'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:252:in `block in register_plugins'", "org/jruby/RubyArray.java:1734:in `each'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:252:in `register_plugins'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:594:in `maybe_setup_out_plugins'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:262:in `start_workers'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:199:in `run'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:159:in `block in start'"], :thread=>"#<Thread:0x54946c93@/usr/share/logstash/logstash-core/lib/logstash/pipeline_action/create.rb:46 run>"}
Aug 24 13:10:04 elastic-server logstash[17399]: [2018-08-24T13:10:04,777][ERROR][logstash.agent           ] Failed to execute action {:id=>:main, :action_type=>LogStash::ConvergeResult::FailedAction, :message=>"Could not execute action: PipelineAction::Create<main>, action_result: false", :backtrace=>nil}

After downgrading logstash-filter-translate from 3.2.0 to 3.1.0 everything works as expected with the full config.

I'll try my best to produce minimal working example that will result into the aforementioned error.

guyboertje commented 6 years ago

Reviewing the docs I was reminded of this bit.

The format of the table should be a standard YAML, JSON, or CSV. Make sure you specify any integer-based keys in quotes.

v3.2.0 was refactored to improve the way very large dictionary files are loaded. Previously the entire file was read into memory then parsed by the standard ruby CSV library as a single string. Now, each line is read as a string and fed into a single CSV parser instance via a rewindable StringIO object.

This means that each record must have the same shape and datatype.

I have retested my example posted above with three different files.

"100",drop
"101",drop
"102",drop
"103",drop
from,dest
100,drop
"101",drop
"102",drop
"103",drop
100,drop
"101",drop
"102",drop
"103",drop

None reproduced the reported error.

qikiqi commented 6 years ago

Yes, I also read that and tried to quoting all integer-based keys (1st column).

Although now thinking back, I had varying format since I didn't quote strings (f.ex. the header 'from'). So the file looked something like this:

$ cat file.csv
from,dest
"83",1-20
"88",1-30
"98",1-40

I'll try to trim the file removing the 'header' column and quoting all the integer-based keys or maybe both columns, w/e I'll try both.

Will report back on Monday. Have a nice weekend!

qikiqi commented 6 years ago

Ok here is a minimal example that produces the aforementioned error on my computer:

$ /usr/share/logstash/bin/logstash -e 'input {
                      generator {
                        lines => ["100", "101"]
                        count => 1
                        tags => ["2nd"]
                      }
                    }
                    filter {
                      if "1st" in [tags] {
                        csv {
                          separator => "; "
                          quote_char => "|"
                          id => "csv_filter"
                        }
                        #drop { }
                      }
                      else if "2nd" in [tags] {
                        translate {
                          dictionary_path => "/usr/share/logstash/temporary/csv/file.csv"
                          field => "message"
                          destination => "dest"
                          override => false
                          id => "translate_filter"
                        }
                      }
                    }
                    output {
                      stdout { codec => rubydebug {metadata => true} }
                    }'

And the contents of the file.csv

$ cat file.csv
"100",Continue
"101",Switching Protocols

Note that if you comment the logstash-filter-csv and uncomment the logstash-filter-drop everything will work fine.

In my case, the 'real' config had a similar structure which results into the error. For now I've fixed it by using YAML files instead of CSV files.

Edit: Added the CSV file

guyboertje commented 6 years ago

@qikiqi

Ahhhh. Thanks for that. Found the problem. When the the CSV filter is in the config then the line in the Translate filter csv_file class that initialises the dictionary load is supposed to call the Ruby CSV library CSV.new method but it is actually calling the CSV filter plugin's new method.

It is quite easy to fix.