wireservice / csvkit

A suite of utilities for converting to and working with CSV, the king of tabular file formats.
https://csvkit.readthedocs.io
MIT License
6.01k stars 603 forks source link

Can in2csv add a byte order mark (BOM) so that when opening csv in Excel it correctly formats unicode text? #1267

Open river-ride opened 1 week ago

river-ride commented 1 week ago

There is a short write up here [https://hilton.org.uk/blog/csv-excel] that describes the issue i.e. on double clicking a .csv file to open it, Excel doesn't recognise that it is UTF-8 encoded without a Byte Order Mark.

This can be fixed by simply appending the correct BOM when writing the csv: echo -ne "\xEF\xBB\xBF" | cat - data.csv > data-with-BOM.csv

Would be great if in2csv could incorporate this as standard in the csv output if possible

Thanks

jpmckinney commented 1 week ago

Adding a BOM to all output will break a lot of CSV applications, which do not expect an extra 3 bytes.

We could add an option to csvformat (the tool that controls output format – all other tools have a consistent output format), but it will not be much different than that command.