Open ericmtsai opened 4 years ago
Capture this as part of the error checking? Not sure exactly what step this is in the process.
Yes, error-checking should include check of encoding. Use
try:
df = pd.read_csv('Candidate.txt',sep='\t')
except UnicodeEncodeError:
Since we support English and Spanish letters now with the current default encoding iso-8859-1, my thinking is that will be fine for the beta release. I plan to push #54 off at least until I have the other issues addressed
Stephanie 11:40 AM Sounds reasonable. I’m curious: is the difficulty in the interactive piece allowing the user to specify the encoding? If not, where is the difficulty? New
eric 11:43 AM ... the larger issue is that these encodings errors can fail silently OR they can throw the error. If they fail silently, then they'll get written to the database as junk letters/characters. So it will be challenging to detect both possibilities. Additionally, from what I've read, it's challenging to detect the actual file encoding during run time. So this all led me to put it aside for now
Stephanie 11:46 AM If the junk characters are consistent, that might not constitute failure. I can live with a system that reports something like “138,392 votes for Maria Nu!@#$@ez in Congressional District 5”, since a user with local knowledge would likely be able to parse the answer.
Since code now assumes iso-8859-1, no need to address this further for Beta Release.
Ugly fail on encoding error in Candidate.txt file; should fail gracefully and allow user to enter encoding; also we should assume iso-8859-1 (which, unlike utf-8, can handle Spanish characters).