jdunck / python-unicodecsv

Python2's stdlib csv module is nice, but it doesn't support unicode. This module is a drop-in replacement which *does*. If you prefer python 3's semantics but need support in py2, you probably want https://github.com/ryanhiebert/backports.csv
Other
594 stars 87 forks source link

Sniffer returns unicode but Reader expects bytes #35

Open taliaga opened 10 years ago

taliaga commented 10 years ago

Hi,

unicode.Sniffer returns unicode delimiters when fed with a unicodestring, but that makes unicodecsv.reader choke with this message:

>       self.reader = csv.reader(f, dialect, **kwds)
E       TypeError: "delimiter" must be string, not unicode

Here is a sample test that shows the problem:

def testUnicodeDelimiters(tmpdir):
    csv_filename = str(tmpdir / u'input.csv')

    with io.open(csv_filename, u'wb') as csv_file:
        csv_writer = unicodecsv.writer(csv_file, encoding=u'utf-8')
        csv_writer.writerow([u'Sandstone, no scavenging', u'4.4', u'3.3'])
        csv_writer.writerow([u'Sandstone, low scavenging', u'5.5', u'6.6'])

    with io.open(csv_filename, u'r', encoding=u'utf-8') as csv_file:
        data = csv_file.read()

    sniffer = unicodecsv.Sniffer()
    dialect = sniffer.sniff(data)
    #dialect.delimiter = bytes(dialect.delimiter)
    #dialect.quotechar = bytes(dialect.quotechar)
    reader = unicodecsv.reader(StringIO(data), dialect=dialect)
    contents = list(reader)
    assert contents == [
        [u'Sandstone, no scavenging', u'4.4', u'3.3'],
        [u'Sandstone, low scavenging', u'5.5', u'6.6'],
    ]
    assert {type(s) for s in contents[0]} == {unicode}
    assert {type(s) for s in contents[1]} == {unicode}

It seems the problem only happens when a delimiter is found inside one of the input strings ("Sandstone, no scavenging").

Please note that commented lines fix the test.

Kind Regards,

jdunck commented 9 years ago

This seems related to #36. I'm curious if anyone else has run into this problem -- I suspect Sniffer is little-used.