logstash-plugins / logstash-filter-kv

Apache License 2.0
17 stars 42 forks source link

Not all fields are being split - change in behavior from 4.1.1 to 4.2.1 #74

Closed BobBlank12 closed 5 years ago

BobBlank12 commented 5 years ago

EXAMPLES: Test input data:

<14>datetime=2018-10-18T12:59:10-0400|aField=valueofacode|xyz=valueofacc|IP=192.168.1.100|MyType=\MyType Value\|Error=\Error Value\|RetCode=123|Dir=Northwest|headerFrom=|Sender=johndoe@mail.services.com|Rcpt=jane.doe@email.biz|Act=Hello|RejInfo=\rej infor value\|TlsVer=TLSv1.2|Cphr=THIS_IS_MY_CPHR_256 Logstash conf file: input { stdin {} } filter { kv { field_split => "|" trim_value => "[\\]" } } output { stdout { codec => "rubydebug" } } **Logstash 6.2.4 with plugin: logstash-filter-kv (4.1.1) 6.2.4 output** ``` { "@version" => "1", "host" => "Bobs-MacBook-Pro-2.local", "<14>datetime" => "2018-10-18T12:59:10-0400", "aField" => "valueofacode", "xyz" => "valueofacc", "IP" => "192.168.1.100", "Error" => "Error Value", "RetCode" => "123", "Act" => "Hello", "Dir" => "Northwest", "Rcpt" => "jane.doe@email.biz", "RejInfo" => "rej infor value", "Sender" => "johndoe@mail.services.com", "MyType" => "MyType Value", "TlsVer" => "TLSv1.2", "@timestamp" => 2018-10-19T00:31:54.401Z, "Cphr" => "THIS_IS_MY_CPHR_256", "message" => "<14>datetime=2018-10-18T12:59:10-0400|aField=valueofacode|xyz=valueofacc|IP=192.168.1.100|MyType=\\MyType Value\\|Error=\\Error Value\\|RetCode=123|Dir=Northwest|headerFrom=|Sender=johndoe@mail.services.com|Rcpt=jane.doe@email.biz|Act=Hello|RejInfo=\\rej infor value\\|TlsVer=TLSv1.2|Cphr=THIS_IS_MY_CPHR_256" ``` } **Logstash 6.4.2 with plugin: logstash-filter-kv (4.2.1) 6.4.2 output** ``` { "Sender" => "johndoe@mail.services.com", "Cphr" => "THIS_IS_MY_CPHR_256", "xyz" => "valueofacc", "host" => "Bobs-MacBook-Pro-2.local", "@version" => "1", "<14>datetime" => "2018-10-18T12:59:10-0400", "Rcpt" => "jane.doe@email.biz", "IP" => "192.168.1.100", "@timestamp" => 2018-10-19T00:31:00.902Z, "MyType" => "MyType Value\\|Error=\\Error Value\\|RetCode=123", "Dir" => "Northwest", "aField" => "valueofacode", "message" => "<14>datetime=2018-10-18T12:59:10-0400|aField=valueofacode|xyz=valueofacc|IP=192.168.1.100|MyType=\\MyType Value\\|Error=\\Error Value\\|RetCode=123|Dir=Northwest|headerFrom=|Sender=johndoe@mail.services.com|Rcpt=jane.doe@email.biz|Act=Hello|RejInfo=\\rej infor value\\|TlsVer=TLSv1.2|Cphr=THIS_IS_MY_CPHR_256", "Act" => "Hello", "RejInfo" => "rej infor value\\|TlsVer=TLSv1.2" } ```
yaauie commented 5 years ago

@BobBlank12 backslashes carry special semantic meaning, escaping the character after each from its own semantic meaning. In your case, the backslash-pipe sequence is causing the parser to ignore the semantic meaning of the pipe as a separator and to capture it as part of the value instead.

Backslash escapes have long been a part of this plugin, although the 4.1.x->4.2.x refactor made them be handled more consistently. To prevent a backslash from being interpreted as an escape sequence, it in turn needs to be escaped with a preceding backslash.


If you have control of the shape of your data, I would advise an alternate quoting mechanism (e.g., when include_brackets =>true, as is default), a variety of open/close bracket pairings are supported (e.g. angle brackets <...>, square brackets [...], and parens (...)).

Otherwise, invoking the following filter prior to the KV will replace your backslash-quoting with compatible square-bracket quoting:


filter {
  # replace backslash-quoting in kv input with open/close
  # bracket quoting that is compatible with kv filter plugin:
  mutate {
    gsub => [
      "message", "(?<==)(?:\\)", "[",
      "message", "(?:\\)(?=|)",  "]"
    ]
  }
}
colinsurprenant commented 5 years ago

I believe appropriate explanation+workaround was provided, I will go ahead and close the issue. @BobBlank12 feel free to reopen if needed.

BobBlank12 commented 5 years ago

Thank you!