Open davidwaroquiers opened 4 months ago
Attention: 3 lines
in your changes are missing coverage. Please review.
Comparison is base (
eda2a65
) 99.42% compared to head (b246ce6
) 97.48%. Report is 4 commits behind head on main.
Thank you for kicking this off (again), @davidwaroquiers! As you already know, I'm a huge proponent of this idea in general.
I'll let @utf take the lead on any logistical comments, but from me, I'll just say this is exciting to see. Thanks for baking in my WIP PR in #450 as well!
In general, the UX here is pretty much exactly what I would expect to see.
the API this enables looks very appealing! if all edge cases that might negatively affect UX can be addressed, this looks like a big win.
Currently breaks @janosh magic methods.
weird... do you know why?
Hi @davidwaroquiers, thanks for this PR. This is an interesting idea and implementation.
As I mentioned in #450, I have some concerns with the indexing features. While these can definitely be addressed, I think it would be better to remove those features from this PR so we can focus only on the @flow
decorator. I've copied my concerns below.
Regarding the specific implementation, it looks promising. I would like to spend a little more time thinking it over and testing it locally.
Incompatibility with output attributes
One potential confusion is that you aren't able to do something like:
job2 = capitalize(job1.hello)
We could add that functionality in, but it then raises an issue with
job1.output
e.g., if your output document has an attribute calledoutput
this will return theOutputReference
and notOutputReference.output
.A potential fix would be to rename all of the Job attributes and functions to
attribute_
, E.g.,job.output_
,job.graph_
,job.update_maker_kwargs_
etc. This would be a breaking change but not too onerous to fix.Incompatibility with jobs without a dictionary output
Another issue is with jobs that output a python primitive directly. E.g., your
capitalize
function. Consider these jobs.@job def capitalize(s): return s.upper() @job def decapitalize(s): return s.lower()
This change might think users than can do:
job1 = capitalize("hello") job2 = decapitalize(job1) flow = Flow([job1, job2])
But that will fail since jobs can only accept
OutputReferences
(or other python objects) as inputs.
@davidwaroquiers: is there anything here that you would like me to prototype or add onto to help keep the ball rolling on this? I can't make any guarantees of course but am happy to try and help where I can.
@davidwaroquiers: is there anything here that you would like me to prototype or add onto to help keep the ball rolling on this? I can't make any guarantees of course but am happy to try and help where I can.
Not sure I can catch up on this at this stage. It's just that I had some time and I had thought about it so I decided to put it in a WIP PR for discussion. Also, before going on with this implementation, we should sit down and see what we want to have, what's "nice to have" and what's not on the table. While I agree with @utf that we should separate concerns, probably we should still first discuss the user experience. Basically "write down" how we would like to write our workflows, i.e. all cases we are thinking about, and see which are already handled, which would need the flow decorator and which would need the indexing features. I'm open to have a meeting on this if it could make things move.
Summary
This is a WIP PR trying to develop the flow decorator (see idea #416, also linked to #450). Seems like we could have another option to do what we want. Currently in a (very) proof-of-concept phase (previous attempt with abstract syntax trees failed).
Currently works for "simple" flows (i.e. flows of jobs), and only if the script creates only one flow. I think it should be generalizable, but before spending more time, I'd like to get some feedback from @utf, @Andrew-S-Rosen, @janosh, and maybe others. Basically, it works by adding a reference to the created jobs and flows into a "global" variable (I used the singleton from monty, not a standard python global variable, not exactly sure if it is needed and/or if it helps and/or what it adds in this specific case).
Using what @Andrew-S-Rosen did for having something more user friendly (see #450).
Currently breaks @janosh magic methods.
Another addition should be that the flow output is stored in the database somehow ? (some things to be made sure here though, we don't want to duplicate outputs both in jobs and flows).
Example in test_flows.py:test_flow_decorator
TODO (if any)
If this is a work-in-progress, write something about what else needs to be done.
Checklist
Work-in-progress pull requests are encouraged, but please put [WIP] in the pull request title.
Before a pull request can be merged, the following items must be checked:
Note that the CI system will run all the above checks. But it will be much more efficient if you already fix most errors prior to submitting the PR. It is highly recommended that you use the pre-commit hook provided in the repository. Simply
pip install pre-commit
and thenpre-commit install
and a check will be run prior to allowing commits.