Open danhermann opened 6 years ago
This is likely a result of ordering on logstash 6+ not being guaranteed when inserting into the queue between inputs and filters+outputs.
Proper fix requires synchronization across all threads, essentially a rearchitecture of a big portion of the filter.
The only current workaround is indeed setting workers => 1, features like autodetect_column_names
shouldn't rely on event ordering, as we don't guarantee it, specially for workers > 1.
thanks for pointing this out, it helped to fix my issue that autodetect_column_names
always messed up my mapping.
I've set my workers => 1 to fix my issue.
However, currently I use config in "logstash.yml" to set "pipeline.workers: 1", it impacted every pipeline, is there any configuration item i could use in a specfic pipeline.conf? because by doing that i could only use 1 worker for csv input that needs autodetect_column_names
feature.
another issue is when i have 2 files, each file has a header, the header of second file will still be loaded, is there any way to deal with that?
@siben168, you can set the number of workers on each pipeline in the pipelines.yml
file. See more details here: https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html
Unfortunately, as the filter is currently written, I don't know of a way to handle multiple files where each one has its own header file.
I'm experiencing this in logstash 5.6.2 as well.
Note that the new csv codec should be more appropriate for this - in particular, when paired with the file input it will also use a separate codec instance per-file thus able to correctly adjust the columns per potentially different files.
There's a race condition with the
autodetect_column_names
feature when there is more than one worker thread. The filter assumes that the first line of the CSV contains the column names but with multiple worker threads, the filter may receive lines in a different order than they are presented in the input.skip_header
may have a similar problem.Noticed while debugging the
java.lang.ArrayIndexOutOfBoundsException
that was mentioned in this blog post:https://mikehillwig.com/2018/02/23/making-peace-with-logstash-part-2-parsing-a-csv/