python-bonobo / bonobo

Extract Transform Load for Python 3.5+
https://www.bonobo-project.org/
Apache License 2.0
1.59k stars 145 forks source link

[doc] Basic documentation #1

Closed hartym closed 6 years ago

hartym commented 7 years ago
rsyring commented 7 years ago

Please consider adding a section that compares bonobo to PETL.

hartym commented 7 years ago

Yes, comparisons to other tools are planned.

In the list (feel free to complete it) :

If some expert on any of those tools is available to help me make the more honest comparison possible, it'd be amazing.

funkyfuture commented 7 years ago

ciao, bonobo might be something that i need as a pythonic replacement of xslt, thus i consulted the docs to get a grip of it. i didn't find out whether it fits, but i found some questions that would help me to figure it out. maybe that helps you when you update the docs (which i would strongly suggest as the library looks promising, but it's hard to judge if it'd be suited for a task.)

on a sidenote, what the heck is marketing-automation? how would that make the world a better place?

hartym commented 7 years ago

Hi @funkyfuture

Not easy to understand what you're looking for. You're saying "pythonic replacement of xslt", and bonobo can transform xml into something else (or into another xml). Which sounds like what you say, but not certain about your use case and whether or not it would be an idea worth considering.

I'll try to answer your questions here, even if this would maybe suit more a discussion on slack than comments in another ticket. I'll consider your questions for a future F.A.Q. section in the doc (along with others, of course)

What exact facilities are available to control the evaluation logic of a graph? This question I don't understand. Graph are not "evaluated" but are a tool to define the flow of data. Nodes in a graph are linked directionally, and there are FIFO queues between output of a node and input of the next, when the graph is executed (those queues are only created by the executor, and thus executions are isolated). Feel free to explain what you meant in different words if I did not answer.

Can a graph contain another graph? There are no tools today in bonobo to insert a graph as a subgraph. It would be great to allow so, but there is a few design questions behind this, like what node you use as input and output of the subgraph, etc. Probably something that will come way after 1.0.

How would one access contextual data from a transformation? / are there parameter injections like pytest's fixtures? You have the question and the answer here. You have parameter injections like pytest fixtures, and it is the way to go to access contextual data in a transformation. The API may evolve a bit though, because I feel it's a bit hackish, as it is. I mean, it's the right concept, but the exact syntax used make me feel it's not the best experience we can have. To understand how it works today, look at https://github.com/python-bonobo/bonobo/blob/0.2/bonobo/io/csv.py#L63 and class hierarchy.

Are there yet any concepts how to process trees, like xml? There was the "xml mapper" in bonobo ancestor that had a bit of logic to explain how to go from a xml "blob" to lines of data (cf https://github.com/hartym/rdc.etl/blob/dev/rdc/etl/transform/map/xml.py). It's not exactly "tree processing", but as an ETL is a line-by-line processor, you need to be able to transform your tree in something more flat, and there may be a lot of different options to do so. Think depth first, width first, skip items or not, preprocess depending on type, etc. It may be better to just write your flattening logic in a function, then process it with regular tools as it's not a tree anymore.

How is a plugin distinguished from a python import in a module that contains transformation callables? Transformation callables are just regular callables, and there is nothing that differentiate it from regular python callables. You can even use some callables both in an imperative programming context and in a transformation graph, no problem. Plugins in bonobo is a different concept that allows one to "enhance" executions in a generic way. For example, the console plugin enhance execution with a nice ANSI output that displays statistics while the execution is running (https://github.com/python-bonobo/bonobo/blob/0.2/bonobo/ext/console/plugin.py). I'd say, no need to think about this for standard ETL cases, it's more a way to extend the framework in itself than userland.

On a sidenote, what the heck is marketing-automation? how would that make the world a better place? It is tagged as such because I have use cases where I use bonobo for marketing automation. It's probably a derivative usage and not the main point, but I guess there is such a use case (think IFTTT or Zappier, but programmatic). Bonobo never promised to "make the world a better place", but I'd say it's a good thing for you if you're wasting time on repetitive marketing tasks and bonobo helps you automate it. My own sidenote: I don't understand why people tend to think marketing is a bad thing.

I hope it answers your questions, if not, let's have a chat on slack so I better understand your points.