materialsproject / jobflow

jobflow is a library for writing computational workflows.
https://materialsproject.github.io/jobflow
Other
90 stars 25 forks source link

WIP: Flow decorator #547

Open davidwaroquiers opened 4 months ago

davidwaroquiers commented 4 months ago

Summary

This is a WIP PR trying to develop the flow decorator (see idea #416, also linked to #450). Seems like we could have another option to do what we want. Currently in a (very) proof-of-concept phase (previous attempt with abstract syntax trees failed).

Currently works for "simple" flows (i.e. flows of jobs), and only if the script creates only one flow. I think it should be generalizable, but before spending more time, I'd like to get some feedback from @utf, @Andrew-S-Rosen, @janosh, and maybe others. Basically, it works by adding a reference to the created jobs and flows into a "global" variable (I used the singleton from monty, not a standard python global variable, not exactly sure if it is needed and/or if it helps and/or what it adds in this specific case).

Using what @Andrew-S-Rosen did for having something more user friendly (see #450).

Currently breaks @janosh magic methods.

Another addition should be that the flow output is stored in the database somehow ? (some things to be made sure here though, we don't want to duplicate outputs both in jobs and flows).

Example in test_flows.py:test_flow_decorator

TODO (if any)

If this is a work-in-progress, write something about what else needs to be done.

Checklist

Work-in-progress pull requests are encouraged, but please put [WIP] in the pull request title.

Before a pull request can be merged, the following items must be checked:

Note that the CI system will run all the above checks. But it will be much more efficient if you already fix most errors prior to submitting the PR. It is highly recommended that you use the pre-commit hook provided in the repository. Simply pip install pre-commit and then pre-commit install and a check will be run prior to allowing commits.

codecov[bot] commented 4 months ago

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (eda2a65) 99.42% compared to head (b246ce6) 97.48%. Report is 4 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #547 +/- ## ========================================== - Coverage 99.42% 97.48% -1.94% ========================================== Files 21 21 Lines 1564 1592 +28 Branches 425 429 +4 ========================================== - Hits 1555 1552 -3 - Misses 9 36 +27 - Partials 0 4 +4 ``` | [Files](https://app.codecov.io/gh/materialsproject/jobflow/pull/547?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=materialsproject) | Coverage Δ | | |---|---|---| | [src/jobflow/\_\_init\_\_.py](https://app.codecov.io/gh/materialsproject/jobflow/pull/547?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=materialsproject#diff-c3JjL2pvYmZsb3cvX19pbml0X18ucHk=) | `100.00% <100.00%> (ø)` | | | [src/jobflow/core/job.py](https://app.codecov.io/gh/materialsproject/jobflow/pull/547?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=materialsproject#diff-c3JjL2pvYmZsb3cvY29yZS9qb2IucHk=) | `97.78% <95.23%> (-2.22%)` | :arrow_down: | | [src/jobflow/core/flow.py](https://app.codecov.io/gh/materialsproject/jobflow/pull/547?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=materialsproject#diff-c3JjL2pvYmZsb3cvY29yZS9mbG93LnB5) | `89.77% <84.61%> (-10.23%)` | :arrow_down: |
Andrew-S-Rosen commented 4 months ago

Thank you for kicking this off (again), @davidwaroquiers! As you already know, I'm a huge proponent of this idea in general.

I'll let @utf take the lead on any logistical comments, but from me, I'll just say this is exciting to see. Thanks for baking in my WIP PR in #450 as well!

In general, the UX here is pretty much exactly what I would expect to see.

janosh commented 4 months ago

the API this enables looks very appealing! if all edge cases that might negatively affect UX can be addressed, this looks like a big win.

Currently breaks @janosh magic methods.

weird... do you know why?

utf commented 4 months ago

Hi @davidwaroquiers, thanks for this PR. This is an interesting idea and implementation.

As I mentioned in #450, I have some concerns with the indexing features. While these can definitely be addressed, I think it would be better to remove those features from this PR so we can focus only on the @flow decorator. I've copied my concerns below.

Regarding the specific implementation, it looks promising. I would like to spend a little more time thinking it over and testing it locally.

Incompatibility with output attributes

One potential confusion is that you aren't able to do something like:

job2 = capitalize(job1.hello)

We could add that functionality in, but it then raises an issue with job1.output e.g., if your output document has an attribute called output this will return the OutputReference and not OutputReference.output.

A potential fix would be to rename all of the Job attributes and functions to attribute_, E.g., job.output_, job.graph_, job.update_maker_kwargs_ etc. This would be a breaking change but not too onerous to fix.

Incompatibility with jobs without a dictionary output

Another issue is with jobs that output a python primitive directly. E.g., your capitalize function. Consider these jobs.

@job
def capitalize(s):
    return s.upper()

@job
def decapitalize(s):
    return s.lower()

This change might think users than can do:

job1 = capitalize("hello")
job2 = decapitalize(job1)
flow = Flow([job1, job2])

But that will fail since jobs can only accept OutputReferences (or other python objects) as inputs.

Andrew-S-Rosen commented 4 months ago

@davidwaroquiers: is there anything here that you would like me to prototype or add onto to help keep the ball rolling on this? I can't make any guarantees of course but am happy to try and help where I can.

davidwaroquiers commented 3 months ago

@davidwaroquiers: is there anything here that you would like me to prototype or add onto to help keep the ball rolling on this? I can't make any guarantees of course but am happy to try and help where I can.

Not sure I can catch up on this at this stage. It's just that I had some time and I had thought about it so I decided to put it in a WIP PR for discussion. Also, before going on with this implementation, we should sit down and see what we want to have, what's "nice to have" and what's not on the table. While I agree with @utf that we should separate concerns, probably we should still first discuss the user experience. Basically "write down" how we would like to write our workflows, i.e. all cases we are thinking about, and see which are already handled, which would need the flow decorator and which would need the indexing features. I'm open to have a meeting on this if it could make things move.