Closed guysoft closed 7 years ago
Sure, PR is welcome. There are two places you will need to look at depending on python versions
For python 2, please look at https://github.com/pyexcel/pyexcel-io/blob/master/pyexcel_io/fileformat/_csv.py#L58. UnicodeWriter is the one. You may try to optimise it.
For python 3, please look at https://github.com/pyexcel/pyexcel-io/blob/master/pyexcel_io/fileformat/_csv.py#L197.
The success criteria(or test cases) are:
Do you have those tests written anywhere?
tests for item 2 exist but for 1 and 3 needs writing when the change is to be made.
Can you provide a link to test reading? I want to now what to run so I don't develop the wrong thing. Assume there is no BOM at the moment, I will extend it if needed.
Here is the test code for reading normal csv. Please examine it if it suits your purpose. You can find all test codes under "tests" directory.
Sorry I am not getting the time to come round to this. However I can confirm that the BOM does make reading the CSV also a problem, the characters are added to the end of the first column.
Ok good and surprising news,
Its possible to read and write files with utf-8 BOM when using utf-8-sig
encoding like this:
Reading:
records = pyexcel.get_records(file_name=csv_path, encoding="utf-8-sig")
Writing:
pyexcel.save_as(array=sheet, dest_file_name=data_csv, dest_encoding="utf-8-sig")
I am not sure if there is a need for a PR now, but I really suggest adding this in the documentation
(Found this here when was searching how to read with utf8 bom)
Good news indeed. I will document it before closing this issue.
A UTF-8 encoded should include at the start a BOM to specify that its using UTF-8. There are several csv readers (I think also Microsoft excel) that look for that byte to determine if to read the csv as unicode. Its also [part of unicode(http://unicode.org/faq/utf_bom.html#bom2)
You can get the utf8 BOM like this:
You should see
b'\xef\xbb\xbf'
. This should be added to the start of a generated csv. If you could point me to where is located I can create a pull request.Workaround I use now (in python3, where I have to use
'\ufeff'
because of how it opens a file):