Closed dawi closed 7 years ago
Could you supply an example testcase file that exhibits the problem?
Yes of course, I will create one.
The attached testcases.zip contains one pipeline configuration and two test directories.
Directory tests1 contains one test file with two test cases. Directory tests2 contains the same two test cases but in two separate files.
tests1 will run successfully without --sockets, but will fail with --sockets. tests2 will run always successfully.
Thanks, I'll have a look as soon as I can.
Many thanks for your efforts. :)
@dawi could please you try again with codec json_lines
instead of json
. If it is still not working, please provide the error messages (set --loglevel
to DEBUG
and add --logstash-output
).
I tried to quickly run your tests, but I failed, because you are using a quiet new feature of the grok filter (pattern_definitions
) and I don't have such a recent version of logstash ready to run the tests.
Ok, it works with json_lines
.
At the beginning I wanted to use json_lines
, but maybe I used json_line
instead of json_lines
(which obviously cannot work) and came to the conclusion that I have to use json
codec to be able to test multiline messages.
Anyway, the problem exists with json
codec.
@dawi true, but this is not resolvable due to the way, the plugin logstash-input-unix
is working. The difference between logstash-input-stdin
and logstash-input-unix
is, that in https://github.com/logstash-plugins/logstash-input-stdin/blob/master/lib/logstash/inputs/stdin.rb#L37, the stdin plugin is reading the input line by line (without regard to the used codec) whereas in https://github.com/logstash-plugins/logstash-input-unix/blob/master/lib/logstash/inputs/unix.rb#L88 the unix input is reading available data chunks up to 16384 bytes, where the identification of events within those data chunks is completely left to the used codec. The json
codec does not delimit the events on a line by line base, which is compensated by the stdin input as written above, but this is not the case for the unix input.
I suggest to close this issue, as it is working fine with json_lines
codec.
Ok, I agree, but it would be good if the readme would be more explicit about this. I am wondering if there is any reason to use json
instead of json_lines
at all with logstash-filter-verifier
. If not, then maybe the use of this codec this should be forbidden in logstash-filter-verifier
or a warning could be printed.
@dawi currently the readme states, that the codec normally should be one of line
or json_lines
(https://github.com/magnusbaeck/logstash-filter-verifier/blame/master/README.md#L202). Additionally there is a hint for the usage with --sockets
, that in this case it is especially important to use either line
or json_lines
(https://github.com/magnusbaeck/logstash-filter-verifier/blame/master/README.md#L251).
Also LFV defaults to line
codec, which works in both cases (with or without --sockets
).
What else do you have in mind? If you want the readme to be more explicit about this issue, maybe you create a PR.
@breml Yes, I will think about it. But I find it difficult to decide what make sense and what not, since I am just using logstash only for two weeks now. I am currently wondering if it does make sense to use LFV with any other codec then line
or json_lines
. And if not, why not forbid the use of codecs that are known to cause errors in some cases?
Issuing a moratorium on other codecs is probably a mistake since someone's bound to figure out clever ways to make use of other codecs (possibly custom ones that we don't even know exist). However, warning users that the codec they've configured most likely isn't the best choice would be totally doable. What do you think?
TL;DR: I think it is save to raise a warning if a user uses a codec other than logstash-codec-lines
or logstash-codec-json_line
together with --sockets
.
In my opinion the main issue with the logstash-input-unix
(as well as logstash-input-tcp
) is, that it is not an application level protocol, which has a definition of a message, but rather a transport protocol, which transports a stream of data (message = log event in this case). It is the responsibility of the application layer protocol to define, when a message ends and the next message starts. So we actually use the codecs logstash-codec-line
and logstash-codec-json_lines
to split our data stream into messages (our "protocol" from LFV point-of-view is, each message is separated by a newline).
The logstash-input-stdin
in this regard acts quite similar to an application layer protocol, because every line of input is automatically considered a message.
This means, that all the codec, which assume to get the messages already properly separated (e.g. logstash-codec-csv
, logstash-codec-compress_spooler
) will not work in our current setup.
There is an other problem: LFV does not allow to configure the codec plugin, which means, our "application layer protocol" (each message on a line) must be supported by the codec by default. For example, the logstash-codec-cef
would allow to configure a delimiter (which could be \n
), but by default there is none set, which means, that this codec does also not work with LFV.
So in the end, I think there are only a few codecs, which possibly could work with LFV at the moment:
logstash-codec-gzip_lines
logstash-codec-es_bulk
logstash-codec-graphite
logstash-codec-edn_lines
So, I do not expect the majority of the codecs to currently work with LFV.
Thanks for the analysis @breml! I've pushed a commit that adds a warning when select codecs are used.
I have a problem, testing multiline messages with logstash filter verifier and I am not sure if it is a bug or intended behaviour. Either way, a section in the readme about testing multiline messages could help a lot.
I am using the "json" codec to test multiline messages.
The issue is, that if you use the --sockets flag to speed up the tests you cannot have more than one multiline test case per test file.
In this case you currently have two options:
Is there a reason that it is not possible to have multiple multiline testcases in one file in case you use the --sockets flag?