simonw / csvs-to-sqlite

Convert CSV files into a SQLite database
Apache License 2.0
873 stars 70 forks source link

Help with CSV parsing errors #18

Open janimo opened 6 years ago

janimo commented 6 years ago

Pandas read_csv throws an exception when encountering a line that seems to have too many fields, but it can be made to skip these bad lines and then report them on stdout if passed error_bad_lines=True. While Pandas does not make it easy to deal with these lines ( https://github.com/pandas-dev/pandas/issues/5686 ) , it would be nice if csvs-to-sqlite could offer something. Maybe parsing read_csv ouput and then traversing the file and save the bad lines separately so the user can fix and reprocess them?

janimo commented 6 years ago

With --skip-errors one can take stderr output and sed/grep those lines from the csv and fix them up separately. It would still be helpful if this tool dumped the lines somewhere.

Do you see csvs-to-sql as a tool that should handle most scenarios by itself eventually (error handling, remote files, compressed formats) or being used along with other established command-line tools ?