Closed retoo closed 8 years ago
@retoo Can you please perform step 2 of https://github.com/elasticsearch/logstash/blob/master/CONTRIBUTING.md#contribution-steps
I feel this should use BufferedTokenizer since it solve this problem and is know to Logstash historically as a reusable solution for line splitting.
On Tuesday, June 14, 2016, Reto Schüttel notifications@github.com wrote:
The input plugin might not align the reads by the newlines (i.e. stdin). For example stdin reads 16384 bytes and passes them to the codec, regardless if that read ends with a newline. This commit fixes this by keeping the last 'partial' line and adds it with the next decode call.
In our environment this change didn't have any noticable effect on the performance.
Fixes Issue #37
https://github.com/logstash-plugins/logstash-codec-multiline/issues/37.
You can view, comment on, or merge this pull request online at:
https://github.com/logstash-plugins/logstash-codec-multiline/pull/38 Commit Summary
- Correctly handle reads by the input module that are not aligned to a newline
File Changes
- M lib/logstash/codecs/multiline.rb https://github.com/logstash-plugins/logstash-codec-multiline/pull/38/files#diff-0 (21)
Patch Links:
- https://github.com/logstash-plugins/logstash-codec-multiline/pull/38.patch
https://github.com/logstash-plugins/logstash-codec-multiline/pull/38.diff
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/logstash-plugins/logstash-codec-multiline/pull/38, or mute the thread https://github.com/notifications/unsubscribe/AAIC6p73c-J9XR1zhcoLR90vrbv2XZwjks5qLtBOgaJpZM4I1fAs .
This has to take auto-flush into account.
I would prefer not to do this as I am sure that Event Milling will take this into account.
Yes, the autoflush part is completely broken today. On the other hand the codec is completely useless for us without the fix. Is stdin the only input plugin having this problem?
Yesterday I briefly tried to fix the tests with something simple, but failed. I'll give it another shot sometime today.
@retoo - Do you know how to use LS with your own fork of a plugin?
It may be easier to fork stdin and add Buffered Tokeniser like the line codec does today then use the fork.
Both stdin and multiline codec are unlikely to change very much until we deliver Event Milling and you can always pull bugfixes from upstream.
Event Milling is meant to provide a flexible mini-pipeline inside an input that allows a user to direct LS exactly how they want the data chunk to be processed.
@guyboertje I changed the stdin plugin, it's much cleaner this way.
I'll close this PR and suggest we adopt: https://github.com/logstash-plugins/logstash-input-stdin/pull/11
Until we move forward with proper boundary detection across inputs/codecs, I am proposing this solution to the multiline codec logstash-plugins/logstash-codec-multiline#63
IMHO the stdin input plugin should be thrown out of logstash if not fixed.
It is not just broken, its broken in away that it ruins your day when you finally realize that something is screwing with your log files.
@retoo You are witnessing this problem using the multiline codec right? (per #37). The stdin input is not broken in the way you think it is. It correctly reads blocks of data and passes them to the underlying codec. It is the codec's job to correctly deal with data across blocks. I have recently submitted logstash-plugins/logstash-codec-multiline#63 to fix this problem in the multiline codec.
If you are interested in helping some more with this you could try this fix locally or we could arrange to get a test build of the codec to see how it works for you?
Thanks for the explanation. I'm no longer using logstash in that particular setup, I just got a bit frustrated about seeing it still open after having a working fix in 2016 ;)
@retoo I totally understand, we'll try to do better. Thanks for your contribution.
The input plugin might not align the reads by the newlines (i.e. stdin). For example stdin reads 16384 bytes and passes them to the codec, regardless if that read ends with a newline. This commit fixes this by keeping the last 'partial' line and adds it with the next decode call.
In our environment this change didn't have any noticable effect on the performance.
Fixes Issue #37.