jdunck / python-unicodecsv

Python2's stdlib csv module is nice, but it doesn't support unicode. This module is a drop-in replacement which *does*. If you prefer python 3's semantics but need support in py2, you probably want https://github.com/ryanhiebert/backports.csv
Other
595 stars 90 forks source link

Python 3 fallback to builtin module when no encoding given #67

Closed ryanhiebert closed 8 years ago

ryanhiebert commented 8 years ago

unicodecsv is a drop-in replacement for the csv module on Python 2. On Python 3 that model doesn't work as well, because of the strict distinction between bytes and text. However, it still surprises people that unicodecsv isn't a drop-in replacement for Python 3 as well. #65 demonstrates this surprise.

To make this possible, on Python 3 unicodecsv will fall back to the built-in csv module when no encoding argument is given. This allows code written for Python 3's csv module to work without change.

However, it comes at a price. If you're relying on the default utf-8 encoding of unicodecsv on Python 2, you'll get strange, hard-to-decipher errors on Python 3.

ryanhiebert commented 8 years ago

@jdunck : I don't know if this is the better solution, or if we just need to say that the file must be opened in binary mode on Python 3. I don't like that this splits the world and gives hard to debug edge cases. However, I think that it may help with some very simple cases, such as demonstrated in #65. I don't know if it's worth the trade-off, but I wanted you to see what changes it would take to choose this trade.

jdunck commented 8 years ago

I'd rather be explicit about how we expect this module to be used. The -cliff bug is one of implicit expectation that happened to work before.

At the time I initially wrote the module, py3 may have existed but was very early in terms of community usage. I really just meant it to make unicode CSV processing easier in py2. Obviously the world has moved on, and I think py3's approach to encoding is clean and reasonable. It's different, so it does require some learning and adaptation. I can see that this library could continue to be a useful shim in dual-support for 2/3, but I would also hope that unless the intention is to continue supporting 2/3 from a single codebase, people would then migrate to native py3 csv.

With that in mind, I'd prefer to document:

1) how to use unicodecsv as a 2/3 shim (noting expectations of stream details and at which layer encoding is handled). 2) how to migrate code which uses unicodecsv to using py3's native support for unicode in CSVs.

The 0.14.1 change is a breaking change for what I'd consider edge cases, and I'd say we should either note that people preferring the old way should stick to 0.13.0, or update their code to be more explicit.

@ryanhiebert, do you agree with that approach for this library? If so, do you want to attempt the docs, or want me to?

ryanhiebert commented 8 years ago

I do agree with that approach. We'll have to document that we had some versions that just proxied the csv module, but that was decided to be the wrong approach to Python 3 compatibility.

I'm willing to write some docs, though I can't promise exactly when that'll happen. So I'd say "whoever gets to it first". I think I can probably get something going sometime this week.

One thing worth mentioning, is that if I were to create an ideal Python 2/3 compatible csv module, I'd probably go full-bore with the Python 3 way of doing things, and merely backport it to Python 3. My goal with writing the Python 3 compatibility was to avoid re-writing the code that I'd already done with unicodecsv. I may yet still make a backport of Python 3 csv to Python 2.

jdunck commented 8 years ago

Right, now that py3 is becoming used more, and its design is clearer/more explicit, I'd agree that a py2 module for py3 semantic-compat would be nice. I just don't think it can be this library (unless shipped as a submodule.

ryanhiebert commented 8 years ago

Agreed. Thanks for your input.