Closed rchekaluk closed 11 years ago
Error when Billy scrapes this page:
02:05:18 INFO scrapelib: GET - http://philadelphiacitycouncil.net/council-members/councilwoman-maria-d-quinones-sanchez-7th-district/councilwoman-maria-d-quinones-sanchez-contact/ Traceback (most recent call last): File "/u/apps/virtualenvs/billy/src/billy/billy/ext/ansistrm.py", line 56, in emit stream.write(message) UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 45: ordinal not in range(128) Logged from file init.py, line 177
Here are the obvious non-ascii chars I can see, indeed the first one is Unicode point F1:
Not certain, but I'm wondering if Billy might be constrained to the ascii character set:
billy$ find . -type f -print | xargs grep ascii | egrep -v git ./billy/importers/bills.py: r.encode('ascii', 'replace') for r in remaining])) ./billy/importers/committees.py: logger.debug("No matches for %s" % member['name'].encode('ascii', ./billy/web/api/emitters.py: ensure_ascii=False) ./billy/web/api/emitters.py: return obj.encode("ascii", "replace") ./billy/scrape/bills.py: return filename.encode('ascii', 'replace') ./billy/scrape/legislators.py: return filename.encode('ascii', 'replace') ./billy/scrape/legislators.py: return filename.encode('ascii', 'replace') ./billy/utils/fulltext.py: text = text.encode('ascii', 'ignore') ./billy/utils/fulltext.py: text = text.decode('utf8', 'ignore').encode('ascii', 'ignore')
Fixed in billy https://github.com/opengovernment/billy/commit/8fd1500ec842840950606b28d1e2b8d74abcbaec Reference http://stackoverflow.com/questions/9942594/unicodeencodeerror-ascii-codec-cant-encode-character-u-xa0-in-position-20/9942885#9942885
Error when Billy scrapes this page:
02:05:18 INFO scrapelib: GET - http://philadelphiacitycouncil.net/council-members/councilwoman-maria-d-quinones-sanchez-7th-district/councilwoman-maria-d-quinones-sanchez-contact/ Traceback (most recent call last): File "/u/apps/virtualenvs/billy/src/billy/billy/ext/ansistrm.py", line 56, in emit stream.write(message) UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 45: ordinal not in range(128) Logged from file init.py, line 177
Here are the obvious non-ascii chars I can see, indeed the first one is Unicode point F1:
Not certain, but I'm wondering if Billy might be constrained to the ascii character set: