simonw / big-local-datasette

Publishing a Datasette of open projects from biglocalnews.org
https://biglocal.datasettes.com/
2 stars 0 forks source link

UnicodeDecodeError causes CI publish to fail #22

Closed simonw closed 4 years ago

simonw commented 4 years ago

e.g. https://github.com/simonw/big-local-datasette/runs/700400605?check_suite_focus=true

pop2018_cbsa 144733
Traceback (most recent call last):
  File "../populate_tables.py", line 85, in <module>
    populate_tables(db)
  File "../populate_tables.py", line 79, in populate_tables
    db[table_name].insert_all(url_to_dicts(url=row["uri"]))
  File "/opt/hostedtoolcache/Python/3.8.3/x64/lib/python3.8/site-packages/sqlite_utils/db.py", line 1030, in insert_all
    for chunk in chunks(itertools.chain([first_record], records), batch_size):
  File "/opt/hostedtoolcache/Python/3.8.3/x64/lib/python3.8/site-packages/sqlite_utils/db.py", line 1325, in chunks
    for item in iterator:
  File "../populate_tables.py", line 15, in url_to_dicts
    for row in reader:
  File "/opt/hostedtoolcache/Python/3.8.3/x64/lib/python3.8/csv.py", line 111, in __next__
    row = next(self.reader)
  File "../populate_tables.py", line 14, in <genexpr>
    reader = csv.DictReader(line.decode("utf-8") for line in response.iter_lines())
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 9: invalid continuation byte
##[error]Process completed with exit code 1.
simonw commented 4 years ago

Quick fix: use line.decode("utf-8", errors="ignore")