leondz / cavat

Automatically exported from code.google.com/p/cavat
3 stars 1 forks source link

Browse sometimes breaks with certain character encodings #81

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
cavat> browse doc 3
# Now browsing document id 3 in this corpus (eng-WL-11-174596-12957493.sgm)
cavat> browse sentence 1
Deposed Iraqi dictator Saddam Hussein on Tuesday accused the Americans and 
Israelis of wanting him dead, during a nationalist tirade directed against the 
United States and the judge in his Baghdad trial.
cavat> browse sentence 2
Traceback (most recent call last):
  File "./cavat.py", line 910, in <module>
    print db.cursor.fetchone()[0].decode('utf-8')
  File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 
0: ordinal not in range(128)

Original issue reported on code.google.com by l...@dcs.shef.ac.uk on 3 Aug 2011 at 11:08