pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.32k stars 1.86k forks source link

Add a `newline` parameter to `read_csv` #17709

Open MarkRotchell opened 2 months ago

MarkRotchell commented 2 months ago

Description

Currently if a csv uses \r\n for a new line character and also contains instances of \n within a string field, the \n will be interpreted as a new line, resulting in split records and often schema errors. It would be nice to be able to specify \r\n where that is the case.

Julian-J-S commented 2 months ago

There is a eol_char parameter but just as the separator parameter this is currently limited to a single char

ritchie46 commented 2 months ago

Currently if a csv uses \r\n for a new line character and also contains instances of \n within a string field, the \n will be interpreted as a new line,

New line characters in string fields need to be escaped by quoting the fields. Otherwise it is an invalid csv.

deanm0000 commented 2 months ago

As a work around set the eol_char to \r then do a str.replace on the first column to get rid of the extra \n