Open retoo opened 8 years ago
+1 This is rather critical because it breaks grok patterns.
I'll changed the stdin plugin itself: https://github.com/logstash-plugins/logstash-input-stdin/pull/11
solves the issue for me for now.
This continues to be a problem. It's caused by multiline
not buffering incoming data and matching the pattern against partially received lines (e.g. the stdin
input will send data in chunks of 16384 bytes: https://github.com/logstash-plugins/logstash-input-stdin/blob/master/lib/logstash/inputs/stdin.rb#L17).
Unfortunately, implementing a buffer (as has been suggested: https://github.com/logstash-plugins/logstash-codec-multiline/pull/38#issuecomment-225932673) does not seem possible as there's currently no way to flush it (see https://github.com/elastic/logstash/issues/6523).
https://github.com/logstash-plugins/logstash-input-stdin/issues/16 seems like the same thing, tracked in the stdin input repo
I am having the same issue using the TCP input plugin. I get random lines that are cut by a rogue \n that causes a false positive in the multiline codec (it adds the multiline tag and truncates the line). Is there any known issue with TCP?
@pheyos and I ran into this while using the stdin input as a simple way to test logstash configs.
It seems like it's been a known issue for years that the stdin input doesn't work with the multiline codec, but if you look at the documentation it says:
Read events from standard input.
By default, each event is assumed to be one line. If you want to join lines, you’ll want to use the multiline codec.
If nothing is going to be done about the underlying problem then I think at least the second paragraph of that section of the documentation should be changed to say something along the lines of "By default, each event is assumed to be one line. It is not possible to safely change this. You will lose data if you attempt to use the stdin input with the multiline codec."
I just ran in to this issue streaming JSON from Lambda too Logstash Input TCP, I am getting random \n in the data which looking at the TCPDUMPS do not exist.
At minimum the data should have a tag with _multilineerror or something if it is injecting data when it has problems.
Also adding the max_bytes and max_lines has no effect either. If there is no easy solution to buffering the data. How about making READ_SIZE a tunable variable, this is very annoying
This problem will occur with any other "streaming" codecs like udp, tcp. There is a larger issue of ambiguity between line-oriented & streaming inputs with codecs in our architecture that we are trying to correctly solve. In the mean time I have rebooted the work to fix this here until we have a globally better solution, see #63 - please let me know if you think this seems like a good solution.
(.. sorry wrong issue)
So, what is the latest on this issue? I am still getting this problem. I'm using Logstash version 6.7.0. Is there a workaround?
@hbrothman the latest proposal for this is #63 - let me know if this solution would work for you.
Sure, it looks fine with me. I assume it hasn't been implemented yet, and there is no current work-around. Any idea on how long it will take to be implemented? Thanks.
Sometimes the multiline codec adds random newlines although the are new newlines in the input file