logstash-plugins / logstash-codec-multiline

Apache License 2.0
7 stars 31 forks source link

stdin/multiline-code adds random newlines #37

Open retoo opened 8 years ago

retoo commented 8 years ago

Sometimes the multiline codec adds random newlines although the are new newlines in the input file

robin13 commented 8 years ago

+1 This is rather critical because it breaks grok patterns.

retoo commented 8 years ago

I'll changed the stdin plugin itself: https://github.com/logstash-plugins/logstash-input-stdin/pull/11

solves the issue for me for now.

cwurm commented 7 years ago

This continues to be a problem. It's caused by multiline not buffering incoming data and matching the pattern against partially received lines (e.g. the stdin input will send data in chunks of 16384 bytes: https://github.com/logstash-plugins/logstash-input-stdin/blob/master/lib/logstash/inputs/stdin.rb#L17).

Unfortunately, implementing a buffer (as has been suggested: https://github.com/logstash-plugins/logstash-codec-multiline/pull/38#issuecomment-225932673) does not seem possible as there's currently no way to flush it (see https://github.com/elastic/logstash/issues/6523).

PhaedrusTheGreek commented 6 years ago

https://github.com/logstash-plugins/logstash-input-stdin/issues/16 seems like the same thing, tracked in the stdin input repo

boblatino commented 6 years ago

I am having the same issue using the TCP input plugin. I get random lines that are cut by a rogue \n that causes a false positive in the multiline codec (it adds the multiline tag and truncates the line). Is there any known issue with TCP?

droberts195 commented 6 years ago

@pheyos and I ran into this while using the stdin input as a simple way to test logstash configs.

It seems like it's been a known issue for years that the stdin input doesn't work with the multiline codec, but if you look at the documentation it says:

Read events from standard input.

By default, each event is assumed to be one line. If you want to join lines, you’ll want to use the multiline codec.

If nothing is going to be done about the underlying problem then I think at least the second paragraph of that section of the documentation should be changed to say something along the lines of "By default, each event is assumed to be one line. It is not possible to safely change this. You will lose data if you attempt to use the stdin input with the multiline codec."

edperry commented 6 years ago

I just ran in to this issue streaming JSON from Lambda too Logstash Input TCP, I am getting random \n in the data which looking at the TCPDUMPS do not exist.

At minimum the data should have a tag with _multilineerror or something if it is injecting data when it has problems.

edperry commented 6 years ago

Also adding the max_bytes and max_lines has no effect either. If there is no easy solution to buffering the data. How about making READ_SIZE a tunable variable, this is very annoying

colinsurprenant commented 5 years ago

This problem will occur with any other "streaming" codecs like udp, tcp. There is a larger issue of ambiguity between line-oriented & streaming inputs with codecs in our architecture that we are trying to correctly solve. In the mean time I have rebooted the work to fix this here until we have a globally better solution, see #63 - please let me know if you think this seems like a good solution.

retoo commented 5 years ago

(.. sorry wrong issue)

hbrothman commented 5 years ago

So, what is the latest on this issue? I am still getting this problem. I'm using Logstash version 6.7.0. Is there a workaround?

colinsurprenant commented 5 years ago

@hbrothman the latest proposal for this is #63 - let me know if this solution would work for you.

hbrothman commented 5 years ago

Sure, it looks fine with me. I assume it hasn't been implemented yet, and there is no current work-around. Any idea on how long it will take to be implemented? Thanks.