Open stdweird opened 8 years ago
For future reference: the bug is that UNIXPATH
has no +
, and the above grok pattern makes logstash hang using following input data:
uid:1234 sid:5678 tty:(none) cwd:/som/path/long/long/long/++/mor/long/path/anonymized/file filename:/bin/cut: cut -d. -f2
The same issue occurs if eg the cwd has a space or something. switching from UNIXPATH
to DATA
fixed the issue.
Tips for the logstash people: it took me half a day to get the relevant message, and 20 minutes to figure what was wrong with the pattern.
We have a logstash grok filter with a single
match => {message => '%{PATTERN}'}
, wherePATTERN
is made out of several other patterns joined with|
(i.e. a grok file withPATTERN %{PAT1}|%{PAT2}
; and each sub-pattern also a combination of more patterns).recently, we added an new pattern to the joined list, and logstash starts to consume large amounts cpu after a while (like 30 minutes or so, and it parsed a few messages with the new pattern, so the new pattern itself seems fine).
but maybe we hit some internal threshold/buffersize/.... is there a limit to the size of a single pattern in the
match => message
? we could split the patterns and usematch => { message => ['PAT1', 'PAT2', ...] }
, but would it improve anything?i also found #37, but i don't think it's related. the pattern does have a
GREEDYDATA
at the end, but because it is at the end, it shouldn't matter i think (the new pattern looks likeuid:%{INT:uid:int} sid:%{INT:sid:int} tty:%{DATA:tty} cwd:%{UNIXPATH:cwd} filename:%{UNIXPATH:executable}: %{GREEDYDATA:command}
)