quentinsf / icsv2ledger

Interactive importing of CSV files to Ledger
196 stars 70 forks source link

Handle CSV files with non-UTF-8 encodings, or users with non-UTF-8 locales #81

Closed garyp closed 9 years ago

garyp commented 9 years ago

I have some CSV files from PayPal that have non-ASCII payee names and are encoded in ISO-8859-1. These names get mangled with the current version of icsv2ledger running on my computer (with an en_US.UTF-8 locale).

I tried fixing this in python2 by converting strings to/from Unicode as needed, and it was turning into a very painful mess. Not the least because the cvs module in python2 doesn't support Unicode. Switching to python3 ended up being a much saner approach that required minimal changes to add proper Unicode handling. I hope upping icsv2ledger's python requirement is not a deal-breaker.

NOTE: I assume that all input and output files other than the CSV file (i.e. templates, mappings, ledgers, etc.) are in UTF-8. If the user has been using icsv2ledger in a non-UTF-8 locale with non-ASCII CSV files then they will have to manually convert their template and mapping files (and possibly ledger files, if they haven't been keeping them in UTF-8 as they should) to UTF-8 encoding before updating to this version of icsv2ledger.

quentinsf commented 9 years ago

Gary - thanks for this - it all looks good, but I'd be a bit concerned about a full switch to python3. I make my living from Python coding, but not one of the dozens of machines I own and manage even has v3 installed on it yet...

I certainly don't have the religious objections to it that some people do, and I'm all in favour of making code python3-compatible where possible, but I wonder whether more people would be inconvenienced by unicode in their statements, or by having to install python3...

Having said all that, even though I originated the project, it's quite some time since I've been an active user of it, so I will happily let others decide whether or not to merge it as it currently stands! They may all be on v3 already :-)

Quentin

petdr commented 9 years ago

I'm happy to make the move to python3. I'll just let this sit for a week and if no-one objects then I'll merge this.

garyp commented 9 years ago

It just occurred to me that it wouldn't be very hard to create a version of this pull request that maintains compatibility with python2. That is, rather than adding proper Unicode support to the python2 version (which is what I tried earlier and ran into myriad issues), we can have a version that works as-is in python2 but properly supports Unicode if run under python3.

This would mean that any future changes to icsv2ledger will need to be made with both python2 and python3 compatibility in mind. If that's an acceptable burden, I should have some time next week to re-add python2 compatibility to this pull request.

petdr commented 9 years ago

How do you write the #! so that it uses python3 in preference to python2?

garyp commented 9 years ago

@petdr I'm not sure how to do that. I was thinking of just changing it back to "python" and let the user's PATH dictate which version they get. Though this will be python2 for the majority of users, even if they have python3 installed, unless they've gone to the trouble of using something like pyenv.

petdr commented 9 years ago

I have no strong preferences about which python, and no objections so in it goes.

peterdc commented 9 years ago

Unfortunately the switch to python3 breaks some extensions I had made linking to my scanner. The problem appears buried in the scanner support library and is beyond my ken to debug. I suppose it's too late in the game, but if we could support both python 2 & 3 @garyp suggested I could continue on without the additional Unicode support