logstash-plugins / logstash-filter-kv

Apache License 2.0
17 stars 42 forks source link

KV filter splitting on a field_split value within data #9

Closed jordansissel closed 5 years ago

jordansissel commented 9 years ago

(This issue was originally filed by @markwalkom at https://github.com/elastic/logstash/issues/2458)


If I have a KV document similar to this;

"sentence": "the quick brown fox, jumped over the duck", "author": "Mark Walkom"

And a LS config like;

input {
  stdin {}
}

filter {
  kv {
    trim => "\""
    trimkey => "\"\ \(\)"
    field_split => ","
    value_split => ":"
    }
}

output {
  stdout {
    codec => rubydebug
  }
}

The sentence field is split at the first comma which ends up dividing the string value and the loss of the data in the output;

$ echo ""sentence": "the quick brown fox, jumped over the duck", "author": "Mark Walkom""|bin/logstash agent -f kvbug.conf
Using milestone 2 filter plugin 'kv'. This plugin should be stable, but if you see strange behavior, please let us know! For more information on plugin milestones, see http://logstash.net/docs/1.4.2/plugin-milestones {:level=>:warn}
{
       "message" => "sentence: the quick brown fox, jumped over the duck, author: Mark Walkom",
      "@version" => "1",
    "@timestamp" => "2015-01-28T02:47:22.726Z",
          "host" => "bender.local",
      "sentence" => " the quick brown fox",
        "author" => " Mark Walkom"
}

Ideally it shouldn't do this and my sentence field would be complete.

ghost commented 9 years ago

Same here. Really sucks.The value of value_split is put into /((?:\ |[^, PLACEHOLDER])+)[PLACEHOLDER](?:"%28[^"]+%29"|'%28[^']+%29'|%28%28?:\ |[^, ]%29+%29)/"} . What doesn't offer a lot control of how it works. It's even not possible to hack around because it's placed 2 times in the regex. A simple exact_value_split option would totally do the job... So you can use =>, ==, ", " as split values...

ghost commented 9 years ago

Ok the regex looks different, github removed some stuff when posting the message. Here's a pic of the regex screen shot 2015-08-19 at 14 44 19

robyn-kozierok commented 9 years ago

I am having this exact problem (except that my field_split is " " and I have spaces in my quoted value strings). Is there any work-around for this?

ghost commented 9 years ago

I did need it for nginx access logs, I ended up with value_split => '>' and field_split => ', '. For nginx access log it seems to work for now. But yeah, it's not really robust..

AlexBaker- commented 8 years ago

Looking into this. Here's what I got thus far:

  def register
    @trim_re = Regexp.new("[#{@trim}]") if @trim
    @trimkey_re = Regexp.new("[#{@trimkey}]") if @trimkey

    regex_reserved = ['!', '$', '^', '*', '(', ')', '[', ']', '?', '|', '+', '.', '\\']
    field_split_clean = ""
    @field_split.split('').each do|c|
      field_split_clean += (regex_reserved.include?(c) ? "\\" : "") + c
    end
    value_split_clean = (regex_reserved.include?(@value_split) ? "\\" : "") + @value_split

    valueRxString = "(?:\"([^\"]+)\"|'([^']+)'"
    valueRxString += "|\\(([^\\)]+)\\)|\\[([^\\]]+)\\]" if @include_brackets
    valueRxString += "|((?:\\\\ )|.*?)(?:(?=" + field_split_clean + "(?:(?:\\\\ |[^" + field_split_clean + value_split_clean + "])+)" + value_split_clean + ")|$))"
    @scan_re = Regexp.new("[" + field_split_clean + "]?((?:\\\\ |[^" + field_split_clean + value_split_clean + "])+)\s*[" + value_split_clean + "]\s*" + valueRxString)
    @value_split_re = /[#{@value_split}]/
  end

It almost works, however 6 of 37 tests are failing. Still digging, but any suggestions would be welcome.

shashankcg commented 8 years ago

Any updates here? I don't see it working in 5.0 either. I also tried updating my plugin and there weren't any. The kv filter plugin is in version 3.1.1

My input looks like this: key1=55, key2=qqq\,www\\eee, key3=value3

The escaped comma character in "qqq\,www\\eee" is not considered and we end up losing the following string "www\\eee" for key2.

fbaligand commented 8 years ago

I just tried @jordansissel example and it works fine. By the way, trim => "\"" is useless (because done by default) I obtain this :

      "sentence" => "the quick brown fox, jumped over the duck",
        "author" => "Mark Walkom"
fbaligand commented 8 years ago

@shashankcg If you wish that field separator is not considered when it is escaped, it is a different need that one expressed by @jordansissel in this issue. If you need that precisely, I invite you to open a specific issue for that.

Else, you can reach you goal if your input is key1=55, key2="qqq,www\\\\eee", key3=value3

markwalkom commented 7 years ago

@kiranmai444 you should post that to https://discuss.elastic.co

sevdog commented 5 years ago

Any news regarding this issue?

colinsurprenant commented 5 years ago

This works

input {
  stdin {}
}

filter {
  kv {
    trim_key => "\"\ \(\)"
    field_split => ","
    value_split => ":"
    }
}

output {
  stdout {
    codec => rubydebug
  }
}
{
      "sentence" => "the quick brown fox, jumped over the duck",
        "author" => "Mark Walkom",
       "message" => "\"sentence\": \"the quick brown fox, jumped over the duck\", \"author\": \"Mark Walkom\"",
      "@version" => "1",
          "host" => "mbp15r.local",
    "@timestamp" => 2019-08-05T20:13:41.410Z
}

Closing.