UTF-8 BOM causes failure

lkingsford commented 4 years ago

Current Behaviour

When you create a UTF-8 CSV file with Excel, it creates a UTF-8 byte order marker (BOM) in the first 3 bytes - EF BB BF. When a BOM is present, the script reads the first header item with the BOM in the string, and can't find the rectangle. The error is listed. When you manually remove the BOM (for instance, by changing the encoding in VSCode), the file loads correctly, but UTF-8 Characters are not correctly read (`"Nurſe" in the CSV becomes "NurÅ¿e" on the card - but this is a different issue).

Expected Behaviour

The BOM is read and the byte-order of the file set accordingly. Alternatively, if other architectures are not a concern, the BOM is ignored.

Justification

The BOM is not uncommon - at least in the Microsoft space, with them being produced by (at least) Visual Studio, Excel and Notepad, as well as Google Docs. I use Excel for editing my CSVs of the data for my game.

Additional information

Example error

Unable to find rectangle with id 'ï»¿monster' that was specified in the CSV data file.
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='E:\\Dropbox\\18Dracula\\1.2\\log.log' mode='w' encoding='cp1252'>

Example CSV file

Attached

Data example.zip

lkingsford commented 4 years ago

If you advise whether you're concerned about whether the BOM is going to be useful (for instance, if making the CSV on a different processor), then I can probably make a pull request with a fix when I get some time later this week.

lifelike commented 4 years ago

This has to be solved, and it looks like there is a simple solution in python to just change the encoding type when opening a file from utf-8 to utf-8-sig.

A BOM in UTF-8 is of course ridiculous though, and only a cause of problems (like this one...). I had issues in the past, in this tool or some other code I worked on, because some Microsoft application put a BOM in UTF-8 XML documents. Really Microsoft should read the Wikipedia article on this subject, because even that knows better than they do on this subject. https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8

lifelike / countersheetsextension