Closed reggiegentle closed 9 years ago
In what context are you using a non ASCII character?
If the character is part of the input file, you have to specify the encoding in the ICD as described in http://roskakori.github.io/cutplace/writing-an-icd.html#data-formats
If the character is part of the ICD (e.g. a comment in the ICD) there are no special steps required unless you use plain CSV as format for the ICD itself. In this case, specify the encoding of the ICD using --icd-encoding
as described in http://roskakori.github.io/cutplace/command-line-usage.html#icds-containing-non-ascii-characters.
In my experience the most common encodings for western languages are cp1252
, iso-8859-15
and utf-8
.
The code base for 0.8.0 should be a lot more solid concerning UnicodeError
, but there still is a need to consistently test this. The goal is that under Python 3 all UnicodeError
s result in an at least somewhat helpful DataError
that points to the cell (for Excel and ODS) or line (for delimited and fixed) that caused the error. With Python 2 this goal might be more troublesome to reach, so maybe a few cases still remain a major PITA - which is just another reason to upgrade to Python 3.
We should consider exhaustive testing for one of the next sprints.
With the test improvements of the past weeks, all readers in rowio
and validio
seem to be able to process unicode, so I consider this solved. If still something slipped through the cracks, we'll address this when it shows up.
cutplace seems to bomb out when non-ascii characters are involved in a source file. IE, if you have an e with a tilda over it, cutplace just bombs out. Is this as designed or could it be addressed or is there a potential work-around?