python-bonobo / bonobo

Extract Transform Load for Python 3.5+
https://www.bonobo-project.org/
Apache License 2.0
1.59k stars 146 forks source link

[bug] allow to specify fields in csv writer #337

Closed yanjost closed 5 years ago

yanjost commented 5 years ago

request : allow to specify the field names when writing to a csv file

currently the code uses "context.get_input_fields()". Either I did not understand how it's supposed to be used, or it's a bug !

I will submit a patch

hartym commented 5 years ago

Thanks for the report.

It seems that although sphinx (documentation website) displays the incorrect version of the code, this is something apparently fixed in develop branch (see https://github.com/python-bonobo/bonobo/blob/develop/bonobo/nodes/io/csv.py#L108). Reading the code (and docs), the fields option should do the trick.

Can you confirm it looks like it is, and that it fixes your problem on your side?

Thanks!

yanjost commented 5 years ago

Thanks for the speedy response ! It seems like the fix in develop branch is very similar to mine, with the skip_header option added. Is it possible to merge it to master or would the change be too important ?

Btw thank you for your work on this very simple and efficient tool !

hartym commented 5 years ago

If you can work on a backport to master (without new features, just bug fix), I'm ok to merge it, although develop should be promoted to 0.7 quite soon (~1month) so I'm not certain it's worth the hassle.

Is it an option to use SetFields just before your CSV writer so that the get_input_fields() result contains the right thing ?

yanjost commented 5 years ago

I see SetFields sets the "output fields", but CsvWriter uses the "input fields", did I miss something ?

hartym commented 5 years ago

Well, it sets its own output fields, if you pipe Ainto SetFields(...) into CsvWriter(...), then SetFields' output fields are also CsvWriter's input fields.

yanjost commented 5 years ago

Nice ! I didn't understand that by reading the documentation, my bad !

So I think that a workaround being provided and the bugfix coming soon, this issue can be closed

hartym commented 5 years ago

(I'm making sure that what I said actually works, will close the issue then)

hartym commented 5 years ago

Actually there is an issue in 0.7 related to that and that was not known to me, I did a quick fix in 84749be2c80ae67f2833010673e6217d0fffb97c ; now testing 0.6.

hartym commented 5 years ago

I confirm the bug did not exist in stable version.

Here is my minimal test case for 0.6:

import bonobo

def get_graph():
    graph = bonobo.Graph()
    graph.add_chain(
        [(1, 2), (3, 4), (5, 6)],
        bonobo.SetFields(["x", "y"]),
        bonobo.CsvWriter("output.csv"),
    )
    return graph

if __name__ == "__main__":
    with bonobo.parse_args() as options:
        bonobo.run(get_graph())

Or the same thing using the new 0.7 syntax (old one still works of course):

import bonobo

def get_graph():
    graph = bonobo.Graph()
    graph >> [(1, 2), (3, 4), (5, 6)] >> bonobo.SetFields(
        ["x", "y"]
    ) >> bonobo.CsvWriter("output.csv")
    return graph

if __name__ == "__main__":
    with bonobo.parse_args() as options:
        bonobo.run(get_graph())