logstash-plugins / logstash-filter-csv

Apache License 2.0
15 stars 41 forks source link

Blank line at start of file messes up autodetect_column_names. skip_empty_rows does not fix #74

Open TheVastyDeep opened 5 years ago

TheVastyDeep commented 5 years ago
    input {
         generator { count => 1 lines => [ '' ] }
         file { path => "/user/foo.csv" sincedb_path => "/dev/null" start_position => beginning }
    }
    filter { csv { autodetect_column_names => true skip_empty_rows => true } }
    output { stdout { codec => rubydebug { metadata => false } } }

input foo.csv containing this, or any other valid csv

a,b
1,2

Just run the above configuration with the above data. It results in

[2019-06-11T01:14:06,397][WARN ][logstash.filters.csv     ] Error parsing csv {:field=>"message", :source=>"a,b", :exception=>#<NoMethodError: undefined method `empty?' for nil:NilClass>}
[2019-06-11T01:14:06,405][WARN ][logstash.filters.csv     ] Error parsing csv {:field=>"message", :source=>"1,2", :exception=>#<NoMethodError: undefined method `empty?' for nil:NilClass>}

Since the input generator gets consumed generating an empty set of columns the header of the csv appears in the rubydebug, which does not provide a strong hint as to the problem.

Moving the skip_empty_rows test above the autodetect_column_names would improve things, although that's still not a very good UX, since it requires the user to exactly understand the problem