wireservice / csvkit

A suite of utilities for converting to and working with CSV, the king of tabular file formats.
https://csvkit.readthedocs.io
MIT License
6.03k stars 603 forks source link

Can in2csv add a byte order mark (BOM) so that when opening csv in Excel it correctly formats unicode text? #1267

Open river-ride opened 1 month ago

river-ride commented 1 month ago

There is a short write up here [https://hilton.org.uk/blog/csv-excel] that describes the issue i.e. on double clicking a .csv file to open it, Excel doesn't recognise that it is UTF-8 encoded without a Byte Order Mark.

This can be fixed by simply appending the correct BOM when writing the csv: echo -ne "\xEF\xBB\xBF" | cat - data.csv > data-with-BOM.csv

Would be great if in2csv could incorporate this as standard in the csv output if possible

Thanks

jpmckinney commented 1 month ago

Adding a BOM to all output will break a lot of CSV applications, which do not expect an extra 3 bytes.

We could add an option to csvformat (the tool that controls output format – all other tools have a consistent output format), but it will not be much different than that command.

river-ride commented 1 week ago

OK - understood. Thanks for responding. It would be nice as a format option so that another process doesn't have to be run to make them clickable.

One other thing, if I may... we have a column in Excel as a % and this is a string which in2csv unfortunately (and probably sensibly) strips out the % sign. But this needs to be output as a string as it gets passed onto to a data visualisation app (we have a separate column for the decimal percentage). Is there any way to get in2csv to respect the string formatting of the xlsx?

jpmckinney commented 5 days ago

Please open a new issue for your second issue. I cannot replicate it with an XLSX file that has "%" as one of the column names. You should attach a file that causes the problem, to that new issue.