python-bonobo / bonobo

Extract Transform Load for Python 3.5+
https://www.bonobo-project.org/
Apache License 2.0
1.59k stars 146 forks source link

Support encoding in CsvReader (already there, make sure documentation shows it). #216

Closed mrecht closed 6 years ago

mrecht commented 7 years ago

It would be helpful if the CsvReader class would support the encoding parameter like the FileReader class does. Most of our files are encoded as cp1252 with German umlauts. Therefore, they cannot be loaded using the Default utf-8 encoding. My current work-around is using the FileReader and splitting the lines myself.

hartym commented 7 years ago

As CsvReader and CsvWriter are subclasses of filehandler, they already have an "encoding" option which is used on the open() call for the file.

Are you saying there is a bug and it does not work properly with csv files? In that case would it be possible for you to provide a minimal failing case for that?

Or is there a problem with documenting this actually exist? In that case, could you provide a patch to the docs?

Here is the test case I wrote (against bonobo's develop version, but similar option should exist in 0.5 although you'd not be able to use the env like I do):

import os
import bonobo

def extract():
    yield {'data': "Qüë pënßëß tü dü l'üñïcödë?"}

def get_graph(*, encoding=None):
    graph = bonobo.Graph()
    graph.add_chain(extract, bonobo.CsvWriter('output.csv', encoding=encoding or 'utf-8'))
    return graph

if __name__ == '__main__':
    parser = bonobo.get_argument_parser()
    with bonobo.parse_args(parser) as options:
        bonobo.run(get_graph(encoding=os.getenv('encoding')))

Then in console:

$ python coding.py --env encoding=cp1252
 - extract in=1 out=1 [done]
 - CsvWriter in=1 out=1 [done]
$ file output.csv
output.csv: ISO-8859 text
$ python coding.py --env encoding=utf-8
 - extract in=1 out=1 [done]
 - CsvWriter in=1 out=1 [done]
$ file output.csv
output.csv: UTF-8 Unicode text
hartym commented 7 years ago

Note: the issue looks like a problem in how we generate the api doc for configurables which does not show the options available at all, it seems.

hartym commented 6 years ago

http://docs.bonobo-project.org/en/master/reference/api/bonobo.html#bonobo.CsvReader