Closed zarnovican closed 8 years ago
The commit that bumped to version 0.14.1 was made before that commit, but it was released after that commit. The commit history isn't necessarily a good record of releases. He does try to tag commits appropriately, though.
The strategy for Python 3 had been to just proxy the built-in csv module. Unfortunately, that means that it was a completely different API between Python 2 and Python 3, and I'd essentially have to write 2 versions of my CSV parsing and writing code either way. The new strategy allows you to write a single codebase that will work on Python 2 and 3.
I can't speak for him, but I suspect that the reason that this wasn't seen as a backward incompatible change is because this 1) only affected Python 3, and 2) on Python 3, unicodecsv
was just a pure, completely untested proxy to the built-in csv
module. As I mentioned above, the csv
module has different semantics from unicodecsv
, on both Python 2 and Python 3.
The change referenced added real support for Python 3, instead of just punting, because the APIs are so different.
Since your build failure is on Python 3 only, I would change your import to something like:
import sys
if sys.version_info[0] == 2:
import unicodecsv as csv
else:
import csv
This will give you the same behavior that unicodecsv
did before 0.14.1. I'll have to leave it as a decision for @jdunck whether the versions should be changed, but I do think that the way we've landed on is the better path forward. So if you're wanting the fallback to the built-in csv
module, I think that you should make it explicit in your code.
I hope that helps clarify the reasons that we switched strategies, and gives you a good path forward to making your work Python 3 compatible.
Hi Ryan,
thanks for the fast and informative feedback. I have looked at the problem a little bit deeper and manually tested several combinations (on Linux). The issue is with text vs binary file mode.
Py3,csv,text - :white_check_mark: Py3,csv,binary - :x: "TypeError: 'str' does not support the buffer interface" (is it bug or feature?) Py3,unicodecsv,text - :x: "TypeError: must be str, not bytes" Py3,unicodecsv,binary - :white_check_mark: (I assume that pre-0.14.1 unicodecsv behaves exactly like csv)
You have implemented binary-mode write, but broke text-mode in the process. My guess is that most people would use text mode on CSVs. I have also noticed that most (all?) your unittest are running tests on files opened in binary mode.
The new strategy allows you to write a single codebase that will work on Python 2 and 3.
Gotcha.. It's a compatibility module, bringing Py2's csv semantics into Py3. I believe it's a genuine bug that text-mode write does not work..
I will use your suggestion with conditional import, since python-cliff
's usage of CSV is quite minimal (mostly output to text-mode stdout).
Gotcha.. It's a compatibility module, bringing Py2's csv semantics into Py3. I believe it's a genuine bug that text-mode write does not work..
I suspect that your usage of unicodecsv
simply wasn't aware of what it was doing. In order for you to have written the code identically on Python 2 and 3, you would have written this, with the notably absent declaration of the encoding
:
import unicodecsv as csv
writer = csv.writer(open('myfile.csv', 'w'))
writer.writerow(...)
The problem with the above code is that it previously had completely different semantics on Python 2 and 3.
On Python 2, csv.writer(file)
is the same as csv.writer(file, encoding='utf-8')
. So you didn't realize it, but your file was always being written in binary mode. Python 2's loose string semantics, though, would still let you pretend like it was text. But it wasn't, it was bytes.
On Python 3 though, the built-in csv
module doesn't have an encoding
argument. That means that if I was using that argument, it wouldn't work on Python 3, because the API is different.
You're using unicodecsv
as a drop-in replacement for csv
. The problem is that it's only a reasonable drop-in replacement for Python 2 csv
. If it attempted to be a drop-in replacement for Python 3's as well, it would be a different API, as was already demonstrated. I don't see a reasonable way to make it be both compatible between Python 2 and 3 and be a drop-in replacement for both versions. That's because they have radically different semantics.
The new Python 3 work doesn't add Python 2's semantics to Python 3, but rather adds Python 3's clarity of purpose to the API presented on Python 2. The Python 2 API was the invariant that couldn't be changed, I just had to codify what that API was doing using Python 3's strict split between binary and text.
So you didn't realize it, but your file was always being written in binary mode.
Sorry, I should have been more explicit. When referring to binary vs text mode I was talking about the flag used to open the file (eg, "w" vs "wb"). Simple test (results are on my last comment):
#import unicodecsv as csv
import csv
f = open('myfile.csv', 'w')
#f = open('myfile.csv', 'wb')
ret = csv.writer(f).writerow(('a', 'b', '\u017d'))
f.close()
Py3 text-mode files have associated encoding f.encoding == 'UTF-8'
(that's the default), while binary-mode files are just a stream of bytes. I reckon, that's why Py3 csv module does not have encoding=
anymore. It is handled at file-level.
And I should have said "your file was always being written as if it were in binary mode". Since Python 2's text mode still gives byte strings (str
), it can be treated like bytes.
0.14.0 and 0.14.1 are now correctly tagged in git. Now that there's more than 1 contributor, I should be using a develop branch to make it clear what's a pre-release bump and the actual release...
I'm going to close this issue, as this bug is simply part of the design of the library. Feel free to let me know if you believe that is a mistake.
Hi guys,
Unittest on python-cliff project are failing on py34 env. This is due to recent upgrade to python-unicodecsv 0.14.1. When tested with python-unicodecsv 0.14.0 (from pip) it is passing. The commit that broke it is c68ae77dccd15a88b629ef42d882ab601626f1ee.
What's worrying is that version bump to 0.14.1 (b5301ad762c9fa8aead92a563d24b255eb7e8da9) was before that commit. If using unicodecsv from this commit, it is actually passing. So it looks like the build was made from tip of the master at some point.
Simple test: