Change CSV outputs to include BOM

Per the discussion on StackOverflow (https://stackoverflow.com/questions/25788037/pandas-df-to-csvfile-csv-encode-utf-8-still-gives-trash-characters-for-min), CSV outputs written through pandas will still look like garbage if they are opened in Excel. I have observed the same behavior myself. The solution to this problem is to use

encoding="utf-8-sig"

instead of

encoding="utf-8"

as this will add the byte-order mark (BOM) that Excel checks to determine if it should parse the file as Unicode. The following lines of collation.py should be changed accordingly:

# If this is a long table, then do not include row indices:
if long_table:
    return df.to_csv(file_addr, encoding="utf-8", index=False, **kwargs)
return df.to_csv(file_addr, encoding="utf-8", **kwargs)

jjmccollum / teiphy

Change CSV outputs to include BOM #78