Closed mrecht closed 6 years ago
As CsvReader and CsvWriter are subclasses of filehandler, they already have an "encoding" option which is used on the open() call for the file.
Are you saying there is a bug and it does not work properly with csv files? In that case would it be possible for you to provide a minimal failing case for that?
Or is there a problem with documenting this actually exist? In that case, could you provide a patch to the docs?
Here is the test case I wrote (against bonobo's develop version, but similar option should exist in 0.5 although you'd not be able to use the env like I do):
import os
import bonobo
def extract():
yield {'data': "Qüë pënßëß tü dü l'üñïcödë?"}
def get_graph(*, encoding=None):
graph = bonobo.Graph()
graph.add_chain(extract, bonobo.CsvWriter('output.csv', encoding=encoding or 'utf-8'))
return graph
if __name__ == '__main__':
parser = bonobo.get_argument_parser()
with bonobo.parse_args(parser) as options:
bonobo.run(get_graph(encoding=os.getenv('encoding')))
Then in console:
$ python coding.py --env encoding=cp1252
- extract in=1 out=1 [done]
- CsvWriter in=1 out=1 [done]
$ file output.csv
output.csv: ISO-8859 text
$ python coding.py --env encoding=utf-8
- extract in=1 out=1 [done]
- CsvWriter in=1 out=1 [done]
$ file output.csv
output.csv: UTF-8 Unicode text
Note: the issue looks like a problem in how we generate the api doc for configurables which does not show the options available at all, it seems.
It would be helpful if the CsvReader class would support the encoding parameter like the FileReader class does. Most of our files are encoded as cp1252 with German umlauts. Therefore, they cannot be loaded using the Default utf-8 encoding. My current work-around is using the FileReader and splitting the lines myself.