Closed ryanhiebert closed 8 years ago
@jdunck : I don't know if this is the better solution, or if we just need to say that the file must be opened in binary mode on Python 3. I don't like that this splits the world and gives hard to debug edge cases. However, I think that it may help with some very simple cases, such as demonstrated in #65. I don't know if it's worth the trade-off, but I wanted you to see what changes it would take to choose this trade.
I'd rather be explicit about how we expect this module to be used. The -cliff bug is one of implicit expectation that happened to work before.
At the time I initially wrote the module, py3 may have existed but was very early in terms of community usage. I really just meant it to make unicode CSV processing easier in py2. Obviously the world has moved on, and I think py3's approach to encoding is clean and reasonable. It's different, so it does require some learning and adaptation. I can see that this library could continue to be a useful shim in dual-support for 2/3, but I would also hope that unless the intention is to continue supporting 2/3 from a single codebase, people would then migrate to native py3 csv.
With that in mind, I'd prefer to document:
1) how to use unicodecsv as a 2/3 shim (noting expectations of stream details and at which layer encoding is handled). 2) how to migrate code which uses unicodecsv to using py3's native support for unicode in CSVs.
The 0.14.1 change is a breaking change for what I'd consider edge cases, and I'd say we should either note that people preferring the old way should stick to 0.13.0, or update their code to be more explicit.
@ryanhiebert, do you agree with that approach for this library? If so, do you want to attempt the docs, or want me to?
I do agree with that approach. We'll have to document that we had some versions that just proxied the csv module, but that was decided to be the wrong approach to Python 3 compatibility.
I'm willing to write some docs, though I can't promise exactly when that'll happen. So I'd say "whoever gets to it first". I think I can probably get something going sometime this week.
One thing worth mentioning, is that if I were to create an ideal Python 2/3 compatible csv module, I'd probably go full-bore with the Python 3 way of doing things, and merely backport it to Python 3. My goal with writing the Python 3 compatibility was to avoid re-writing the code that I'd already done with unicodecsv
. I may yet still make a backport of Python 3 csv
to Python 2.
Right, now that py3 is becoming used more, and its design is clearer/more explicit, I'd agree that a py2 module for py3 semantic-compat would be nice. I just don't think it can be this library (unless shipped as a submodule.
Agreed. Thanks for your input.
unicodecsv
is a drop-in replacement for thecsv
module on Python 2. On Python 3 that model doesn't work as well, because of the strict distinction between bytes and text. However, it still surprises people thatunicodecsv
isn't a drop-in replacement for Python 3 as well. #65 demonstrates this surprise.To make this possible, on Python 3
unicodecsv
will fall back to the built-incsv
module when noencoding
argument is given. This allows code written for Python 3'scsv
module to work without change.However, it comes at a price. If you're relying on the default
utf-8
encoding ofunicodecsv
on Python 2, you'll get strange, hard-to-decipher errors on Python 3.