logstash-plugins / logstash-filter-csv

Apache License 2.0
15 stars 41 forks source link

Handle new lines within fields of a record #34

Open ghost opened 8 years ago

ghost commented 8 years ago

Hi, I am currently using logstash 2.3.0 and using csv filter to specify and match columns on the csv data read by a file beat. The issue is, in few cases, the csv file rows have multiple lines. i.e. crlf before the end of the row (due to the type of data present in the table as this is a table export)

For example: My filter: csv{ columns => ['ID','NAME','GRADE','SUBJECTS','EOF'] }

Works for CSV Data: 1, ABC, FIVE,COMPUTERS,$$$ 2, EFG, FIVE,SCIENCE,$$$ Fails when: 3, ABCV, FIVE, COMPUTERS SCIENCE,$$$ (CRLF is present before end of row) 4, ABCV, FIVE, COMPUTERS,$$$

So the rows with crlf gets rejected with parse exception. Is there a way I can specify a column separator ( in my case it is $$$CRLF)

Or is there any configuration which I can use to manage this scenario?

Please suggest. Thanks

markwalkom commented 8 years ago

I was just about to raise something like this as I have the same issue :)

If you open a CSV file in libreoffice/whatever, it handles the CR/LF as knows that these are the same field cause it's looking for the next comma.

We should really do the same here if we can!

markwalkom commented 8 years ago

The RFC says that handling of line feeds in CSV should be done - https://tools.ietf.org/html/rfc4180

(You can blame @PhaedrusTheGreek for pointing that out on the forums :p)

PhaedrusTheGreek commented 8 years ago

I believe the spec was - only if it was enclosed in quotes.

markwalkom commented 8 years ago

True!

OrangeDog commented 7 years ago

It also chokes on a trailing \r, which is what I end up with after using multiline codec to join all the quoted line endings.

Error parsing csv {:field=>"message", :source=>"one,two,\"three\n\",four\r", :exception=>#<CSV::MalformedCSVError: Unquoted fields do not allow \r or \n (line 1).>}
rickyk586 commented 4 years ago

may be helpful: https://stackoverflow.com/questions/44640604/logstash-parse-multiline-csv-file