logstash-plugins / logstash-filter-kv

Apache License 2.0
17 stars 42 forks source link

kv filter incorrectly parse messages, which contain keys with no value #65

Closed yaauie closed 6 years ago

yaauie commented 6 years ago

in https://github.com/elastic/logstash/issues/9786, @eatroshkin reports that logstash-filter-kv v4.1.1 was not correctly parsing messages that hand valueless keys:

Starting from version 5.6.9 logstash incorrectly parse messages, which contains keys with no value. Can be reproduces with config:

input { stdin { } }
filter {
      kv {
        source => "message"
        field_split => "\t"
        value_split => "="
        include_brackets => true
      }
}
output {
  stdout { codec => rubydebug }
}

Result of parsing message a=11 b= c=33 d=44:

{
    "a" => "11",
    "b" => "c=33",
    "d" => "44",
    "message" => "a=11\tb=\tc=33\td=44"
}

logstash version 5.6.8 and before correctly parse the same message:

{
    "a" => "11",
    "c" => "33",
    "d" => "44",
    "message" => "a=11\tb=\tc=33\td=44"
}

OS: Ubuntu trusty, Logstash installed from this repository deb https://artifacts.elastic.co/packages/5.x/apt stable main

I have not yet validated if the recently-released v4.1.2, which had a fix related to over-greedy caputures also resolves this issue.

yodog commented 6 years ago

https://discuss.elastic.co/t/kv-plugin-not-parsing-null-value/138751

i just upgraded to 4.1.2 and can confirm that it is not fixed.

still cant parse the iptables log below

IN=eth0 OUT= MAC=00:50:56:9a:13:2a:02:1f:a0:00:0d:01:08:00 SRC=5.188.207.7 DST=8.8.8.8 LEN=60 TOS=0x08 PREC=0x20 TTL=50 ID=36130 DF PROTO=TCP SPT=31514 DPT=993 WINDOW=29200 RES=0x00 SYN URGP=0

logstash-plain.log is being flooded with the error generated by the empty OUT=

[2018-07-05T12:20:26,297][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"iptables-2018.07.05", :_type=>"doc", :_routing=>nil}, #LogStash::Event:0x85585ea], :response=>{"index"=>{"_index"=>"iptables-2018.07.05", "_type"=>"doc", "_id"=>"gPsHa2QB7yvTzEaiu4if", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [OUT]", "caused_by"=>{"type"=>"illegal_state_exception", "reason"=>"Can't get text on a START_OBJECT at 1:72"}}}}}

yodog commented 6 years ago

also, https://github.com/logstash-plugins/logstash-filter-kv/issues/37

yaauie commented 6 years ago

After rather extensive digging, I've found no way to implement unquoted "empty value" in the presence of the existing lenient-whitespace behaviour; since we have long allowed whitespace to surround the value-split, a space after the value-split is interpreted by the parser to be just an optional whitespace before the value, and changing this behaviour would be a breaking change.

I've attempted to use lookaround assertions in the unquoted-value capture, but have not been met with success; attempts to bail when the unquoted value it captures looks like a key-value pair make the parser recursive.

If there were an option like whitespace => strict that modified the did not allow meaningless whitespace around the value split, we could easily capture empty values in your input as empty. Since it's a trivial opt-in feature, I'll open a PR for that shortly.