Open bogden1 opened 2 years ago
Some discoveries:
grep -m1 -P '[^\x00-\x7F]' joined.csv
, then use the raw_subject
field to locate in a Sheets import)Best guess is that Google Sheets autodetects the incoming charset in some way that does not involve parsing the whole file. Presumably it decides on this basis that our file is ASCII rather than UTF-8. So far as I know, there is no way to tell Google Sheets what the encoding is. I'll update the script to output the dummy header.
Tested by comparing output of ./aggregate.py --output_dir <an output dir> -t 0.3 --row_factor 1 --no_stamp
with and without the fix.
Fixed by 642ba7bca1245666ad6cc591b611c0434deacf0a
Google Sheets is mangling some relatively unusual characters (e.g. degree symbol, ellipses -- probably more). Google Sheets is the easiest way to share the output so it would be really nice to fix this.
The fallback would be to use some other technology -- though that has the risk of introducing some other new problem.
Timebox a day to fix or find an alternative.