Closed jordansissel closed 5 years ago
Same here. Really sucks.The value of value_split is put into /((?:\ |[^, PLACEHOLDER])+)[PLACEHOLDER](?:"%28[^"]+%29"|'%28[^']+%29'|%28%28?:\ |[^, ]%29+%29)/"} . What doesn't offer a lot control of how it works. It's even not possible to hack around because it's placed 2 times in the regex. A simple exact_value_split option would totally do the job... So you can use =>, ==, ", " as split values...
Ok the regex looks different, github removed some stuff when posting the message. Here's a pic of the regex
I am having this exact problem (except that my field_split is " " and I have spaces in my quoted value strings). Is there any work-around for this?
I did need it for nginx access logs, I ended up with value_split => '>'
and field_split => ', '
. For nginx access log it seems to work for now. But yeah, it's not really robust..
Looking into this. Here's what I got thus far:
def register
@trim_re = Regexp.new("[#{@trim}]") if @trim
@trimkey_re = Regexp.new("[#{@trimkey}]") if @trimkey
regex_reserved = ['!', '$', '^', '*', '(', ')', '[', ']', '?', '|', '+', '.', '\\']
field_split_clean = ""
@field_split.split('').each do|c|
field_split_clean += (regex_reserved.include?(c) ? "\\" : "") + c
end
value_split_clean = (regex_reserved.include?(@value_split) ? "\\" : "") + @value_split
valueRxString = "(?:\"([^\"]+)\"|'([^']+)'"
valueRxString += "|\\(([^\\)]+)\\)|\\[([^\\]]+)\\]" if @include_brackets
valueRxString += "|((?:\\\\ )|.*?)(?:(?=" + field_split_clean + "(?:(?:\\\\ |[^" + field_split_clean + value_split_clean + "])+)" + value_split_clean + ")|$))"
@scan_re = Regexp.new("[" + field_split_clean + "]?((?:\\\\ |[^" + field_split_clean + value_split_clean + "])+)\s*[" + value_split_clean + "]\s*" + valueRxString)
@value_split_re = /[#{@value_split}]/
end
It almost works, however 6 of 37 tests are failing. Still digging, but any suggestions would be welcome.
Any updates here? I don't see it working in 5.0 either. I also tried updating my plugin and there weren't any. The kv filter plugin is in version 3.1.1
My input looks like this: key1=55, key2=qqq\,www\\eee, key3=value3
The escaped comma character in "qqq\,www\\eee" is not considered and we end up losing the following string "www\\eee" for key2.
I just tried @jordansissel example and it works fine.
By the way, trim => "\""
is useless (because done by default)
I obtain this :
"sentence" => "the quick brown fox, jumped over the duck",
"author" => "Mark Walkom"
@shashankcg If you wish that field separator is not considered when it is escaped, it is a different need that one expressed by @jordansissel in this issue. If you need that precisely, I invite you to open a specific issue for that.
Else, you can reach you goal if your input is key1=55, key2="qqq,www\\\\eee", key3=value3
@kiranmai444 you should post that to https://discuss.elastic.co
Any news regarding this issue?
This works
input {
stdin {}
}
filter {
kv {
trim_key => "\"\ \(\)"
field_split => ","
value_split => ":"
}
}
output {
stdout {
codec => rubydebug
}
}
{
"sentence" => "the quick brown fox, jumped over the duck",
"author" => "Mark Walkom",
"message" => "\"sentence\": \"the quick brown fox, jumped over the duck\", \"author\": \"Mark Walkom\"",
"@version" => "1",
"host" => "mbp15r.local",
"@timestamp" => 2019-08-05T20:13:41.410Z
}
Closing.
(This issue was originally filed by @markwalkom at https://github.com/elastic/logstash/issues/2458)
If I have a KV document similar to this;
And a LS config like;
The
sentence
field is split at the first comma which ends up dividing the string value and the loss of the data in the output;Ideally it shouldn't do this and my
sentence
field would be complete.