Closed yidingalan closed 6 years ago
Summarizing the stack trace:
[[main]>worker15] ERROR logstash.pipeline - Exception in pipelineworker, the pipeline stopped processing new events, please check your filter configuration and restart Logstash. {"exception"=>"-1", "backtrace"=>["java.util.ArrayList.elementData(ArrayList.java:418)",
...
Exception in thread "[main]>worker15" java.lang.ArrayIndexOutOfBoundsException: -1
at java.util.ArrayList.elementData(ArrayList.java:418)
at java.util.ArrayList.remove(ArrayList.java:495)
at org.logstash.FieldReference.parse(FieldReference.java:37)
at org.logstash.PathCache.cache(PathCache.java:37)
at org.logstash.PathCache.isTimestamp(PathCache.java:30)
at org.logstash.ext.JrubyEventExtLibrary$RubyEvent.ruby_set_field(JrubyEventExtLibrary.java:122)
...
I haven't tried to reproduce yet, but I agree this is something strange. It appears to be a bug, but I am not sure what is causing the problem.
@jordansissel so I've been testing for a bit, and I think what's causing the issue is trim_key
Once I removed trim_key
from kv filter, logstash doesn't seem to crash anymore. I don't know if it's any characters in our trim_key
config that's causing the issue though.
I can't seem to reproduce this on 5.4.0 and your sample data + config:
% cat cfg
input { stdin {} }
filter {
kv {
trim_key => "<>\[\],`\."
remove_field => ["\\%{some_field}", "{%{some_field}"]
include_brackets => false
}
}
output { stdout { codec => rubydebug } }
/tmp/logstash-5.4.0 % echo 'time=2017-05-24T18:23:02.619100+00:00 severity=INFO pid=7078 method=GET path=/projects/20432/report format=html controller=reports action=show status=200 duration=2304.09 view=2172.47 db=89.51 time=2017-05-24 18:23:00 +0000 ip=216.16.231.94 host=demo.central.miovision.com params={"download_token"=>"1495650173", "report"=>{"format"=>"travel_time", "study_observation_set_ids"=>["", "57", "56", "55", ""], "bin_size"=>"3600"}, "project_id"=>"20432"}' | bin/logstash -f cfg
{
"severity" => "INFO",
"controller" => "reports",
"method" => "GET",
"ip" => "216.16.231.94",
"format" => "html",
"\"project_id\"" => ">\"20432\"}",
"pid" => "7078",
"\"report\"" => ">{\"format\"=>\"travel_time\",",
"message" => "time=2017-05-24T18:23:02.619100+00:00 severity=INFO pid=7078 method=GET path=/projects/20432/report format=html controller=reports action=show status=200 duration=2304.09 view=2172.47 db=89.51 time=2017-05-24 18:23:00 +0000 ip=216.16.231.94 host=demo.central.miovision.com params={\"download_token\"=>\"1495650173\", \"report\"=>{\"format\"=>\"travel_time\", \"study_observation_set_ids\"=>[\"\", \"57\", \"56\", \"55\", \"\"], \"bin_size\"=>\"3600\"}, \"project_id\"=>\"20432\"}",
"params" => "{\"download_token\"=>\"1495650173\",",
"\"study_observation_set_ids\"" => ">[\"\",",
"duration" => "2304.09",
"path" => "/projects/20432/report",
"view" => "2172.47",
"@timestamp" => 2017-05-26T16:49:13.296Z,
"@version" => "1",
"host" => "demo.central.miovision.com",
"action" => "show",
"\"bin_size\"" => ">\"3600\"},",
"time" => [
[0] "2017-05-24T18:23:02.619100+00:00",
[1] "2017-05-24"
],
"db" => "89.51",
"status" => "200"
}
@jsvd It doesn't always crash. I am also not 100% sure if that's the specific message that's causing the issue. It could possibly be other messages in our logs
I'm getting the same issue sporadically with the following pipeline. We've managed to reproduce it now with the log entry:
{"": "Exception on /v1/classify [POST]", "timestamp": "2017-08-24 02:48:00.564Z", "service": "mp-sps", "level": "ERROR"}
Looks like it can't handle an empty key in a json structured log.
We're on 5.4.0
input {
beats {
port => 5043
ssl => true
ssl_certificate => "..."
ssl_key => "..."
}
}
filter {
if [type] == "tm-services-log" {
json {
source => "message"
}
# The @timestamp field is the time the record was picked up. timestamp is when it was written, which is better.
date {
match => ["timestamp", "ISO8601"]
target => "@timestamp"
}
# Remove the surplus timestamp from above and some other things that are useless or misleading.
mutate { remove_field => [ "@version", "_type", "beat", "host", "input_type", "offset", "source", "tags", "timestamp" ] }
}
}
output {
if [type] == "tm-services-log" {
if [service] and [env_type] {
elasticsearch {
hosts => ["..."]
ssl => true
user => "..."
password => "..."
index => "tm-services-%{service}-%{env_type}-%{+xxxx.ww}"
}
} else {
elasticsearch {
hosts => ["..."]
ssl => true
user => "..."
password => "..."
index => "tm-services-unknown-%{+xxxx.ww}"
}
}
}
}```
Indeed this looks to be caused by a null or missing key. For example, with the following simple pipeline:
input {
stdin {}
}
filter {
kv {
value_split => ":"
field_split => ","
trim_key => " "
}
}
output {
stdout {
codec => "rubydebug"
}
}
This message would trigger it:
foo:2, :blah, bar:sgasgd
As a workaround, the following mutate filter before the kv filter can catch values with a missing/null key and place them into a single _values_missingkeys field:
mutate {
gsub => ["pwnie_message", ",\s+:",",values_missing_keys:"]
}
I note the issue is non-fatal to Logstash, it sill appears to be able to process subsequent events, but the event which triggered the exception appears to be lost.
this has been solved in 5.6.11 and 6.1.1 in core by https://github.com/elastic/logstash/commit/cdb4f0ca74993235bace13318410c099da6d1fce
Looks like there is something wrong with the kv filter and when we looked at the error log from logstash, we got this:
This has never happened with our old version of ELK. Any idea on what's wrong with our kv filter? Or is this just a bug for 5.4?