jdunck / python-unicodecsv

Python2's stdlib csv module is nice, but it doesn't support unicode. This module is a drop-in replacement which *does*. If you prefer python 3's semantics but need support in py2, you probably want https://github.com/ryanhiebert/backports.csv
Other
595 stars 90 forks source link

Calling writer.writerow on a list of strings in python 3.5 results in TypeError: string argument expected, got 'bytes' #70

Closed ben-en closed 8 years ago

ben-en commented 8 years ago
> /home/ben/.envs/rss/lib/python3.5/site-packages/unicodecsv/py3.py(29)writerow()
     28         import ipdb; ipdb.set_trace()
---> 29         return self.writer.writerow(row)
     30 

ipdb> p row
['http://localhost:10081/content/HiMFYeCsAfJxP9Jc/allianceearth.org', 'South African team may have solved solar puzzle even Google couldn’t crack', 'ARL', 'ephemeral', 'rss_feeder', '2015-11-30 21:53:07 ', '', '']
ipdb> n
TypeError: string argument expected, got 'bytes'
> /home/ben/.envs/rss/lib/python3.5/site-packages/unicodecsv/py3.py(29)writerow()
     28         import ipdb; ipdb.set_trace()
---> 29         return self.writer.writerow(row)
     30 

ipdb> pinfo self.writer.writerow
Docstring:
writerow(iterable)

Construct and write a CSV record from an iterable of fields.  Non-string
elements will be converted to string.
Type:      builtin_function_or_method
ipdb> [type(x) for x in row]
[<class 'str'>, <class 'str'>, <class 'str'>, <class 'str'>, <class 'str'>, <class 'str'>, <class 'str'>, <class 'str'>]

To make extra certain that all of my items in the row i was submitting were in unicode, i also called the following function on all of them:

def to_unicode(v, encoding='utf8'):
    """
    Convert a value to Unicode string (or just string in Py3). This function
    can be used to ensure string is a unicode string. This may be useful when
    input can be of different types (but meant to be used when input can be
    either bytestring or Unicode string), and desired output is always Unicode
    string.
    The ``encoding`` argument is used to specify the encoding for bytestrings.
    """
    if isinstance(v, str):
        return v
    try:
        return v.decode(encoding)
    except (AttributeError, UnicodeEncodeError):
        return str(v)

I'm not sure if i'm providing an incorrect argument, but i'd be happy to provide any further information to assist in determining what exactly is going wrong.

ryanhiebert commented 8 years ago

I'm afraid that I'm not able to reproduce this. Here's my terminal, with my attempt, perhaps you can see where I did things differently:

$ pip install unicodecsv
Collecting unicodecsv
Installing collected packages: unicodecsv
Successfully installed unicodecsv-0.14.1
$ python
Python 3.5.0 (default, Sep 14 2015, 09:14:42)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import io
>>> f = io.BytesIO()
>>> import unicodecsv as csv
>>> writer = csv.writer(f)
>>> row = ['http://localhost:10081/content/HiMFYeCsAfJxP9Jc/allianceearth.org', 'South African team may have solved solar puzzle even Google couldn’t crack', 'ARL', 'ephemeral', 'rss_feeder', '2015-11-30 21:53:07 ', '', '']
>>> writer.writerow(row)
192
>>> f.seek(0)
0
>>> f.readlines()
[b'http://localhost:10081/content/HiMFYeCsAfJxP9Jc/allianceearth.org,South African team may have solved solar puzzle even Google couldn\xe2\x80\x99t crack,ARL,ephemeral,rss_feeder,2015-11-30 21:53:07 ,,\r\n']
>>>
ryanhiebert commented 8 years ago

Ah, you know what, I'll bet you're opening the file that you're writing to in text mode. unicodecsv must write to a binary mode file, because unicodecsv takes care of encoding all the data to bytes.

ben-en commented 8 years ago

Hm, interesting. It does seem like that would be the most likely culprit. I'll have a look at that, thank you.

ben-en commented 8 years ago

That was the solution! I was using a StringIO object. Thanks again!