pyiron / pyiron_workflow

Graph-and-node based workflows
BSD 3-Clause "New" or "Revised" License
14 stars 1 forks source link

[patch] List-like output for for-loops #444

Closed liamhuber closed 2 months ago

liamhuber commented 2 months ago

WIP

Closes #440

@ligerzero-ai, I still need to sand down the rough edges, but I hope this gets 70% of the way to resolving the issues you ran into when making the equation of state stuff.

Before, because the for-loop was returning a dataframe, accessing the underlying dataclass attributes was a pain in the butt and was easiest to accomplish with custom logic

from pyiron_workflow import Workflow

@Workflow.wrap.as_dataclass_node
class SomeData:
    number: int = 42
    word: str = "towel"

wf = Workflow("data_access")

wf.loop = Workflow.create.for_node(
    body_node_class=SomeData,
    iter_on=("number",),
    number=[1, 2, 3],
    word="go",
)

@Workflow.wrap.as_function_node("just_one_field")
def Unpack(data):
    return [d.number for d in data["dataclass"].tolist()]

wf.unpacked = Unpack(wf.loop.outputs.df)

out = wf()
out
>>> {'unpacked__just_one_field': [1, 2, 3]}

Now, the same thing can be accomplished using built-in tools, in this case a second for-loop for the attribute access:

@Workflow.wrap.as_dataclass_node
class SomeData:
    number: int = 42
    word: str = "towel"

wf = Workflow("data_access")

wf.loop = Workflow.create.for_node(
    body_node_class=SomeData,
    iter_on=("number",),
    number=[1, 2, 3],
    word="go",
    output_as_dataframe=False,
)

wf.unpacked = Workflow.create.for_node(
    body_node_class=Workflow.create.standard.GetAttr,
    iter_on=("obj",),
    obj=wf.loop.outputs.dataclass,
    name="number",
    output_as_dataframe=False,  # For backwards compatibility, dataframes are still default!
)

out = wf()
out
>>> {
...     'loop__number': [1, 2, 3],
...     'unpacked__obj': [
...         SomeData.dataclass(number=1, word='go'), 
...         SomeData.dataclass(number=2, word='go'),
...         SomeData.dataclass(number=3, word='go')],
...     ],
...     'unpacked__getattr': [1, 2, 3]
... }

At the moment we still need the extra step to unpack the dataclass. I can anticipate some feedback that this is still annoying, but at least for the moment I want to move one step at a time here. Basically it's not immediately clear to me whether we should be building up a bunch of infrastructure to help users unpack their dataclasses all over the place, or whether it's a better practice to leave them packages and encourage node packages that use dataclasses to use them throughout the node package (e.g. with multiple dispatch to allow dataclass or explicit input).

In the meantime I'm content that the extra step is at least "standard" stuff and not a custom node definition. I have some ideas for tightening the for-loop syntax though -- e.g. at least iter_on (and zip_on) should be able to take a single string instead of a tuple of length 1...

github-actions[bot] commented 2 months ago

Binder :point_left: Launch a binder notebook on branch _pyiron/pyiron_workflow/dataaccess

codacy-production[bot] commented 2 months ago

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
:white_check_mark: +0.14% (target: -1.00%) :white_check_mark: 100.00%
Coverage variation details | | Coverable lines | Covered lines | Coverage | | ------------- | ------------- | ------------- | ------------- | | Common ancestor commit (427733bb144b72b51b626c4cf920083e1fb95698) | 3127 | 2856 | 91.33% | | | Head commit (0a4e05e28e7032249ec077812b5dc48db6c40182) | 3177 (+50) | 2906 (+50) | 91.47% (**+0.14%**) | **Coverage variation** is the difference between the coverage for the head and common ancestor commits of the pull request branch: ` - `
Diff coverage details | | Coverable lines | Covered lines | Diff coverage | | ------------- | ------------- | ------------- | ------------- | | Pull request (#444) | 73 | 73 | **100.00%** | **Diff coverage** is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: `/ * 100%`

See your quality gate settings    Change summary preferences

Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more

coveralls commented 2 months ago

Pull Request Test Coverage Report for Build 10752803009

Details


Files with Coverage Reduction New Missed Lines %
type_hinting.py 1 97.62%
nodes/for_loop.py 3 98.62%
nodes/static_io.py 3 85.71%
<!-- Total: 7 -->
Totals Coverage Status
Change from base Build 10693688049: 0.1%
Covered Lines: 2906
Relevant Lines: 3177

💛 - Coveralls
ligerzero-ai commented 2 months ago

Thanks a lot Liam, I'll have a look at it over the weekend.

review-notebook-app[bot] commented 2 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

liamhuber commented 2 months ago

From the updated quickstart:

wf = Workflow("listlike_for_loop")
wf.first_loop = Workflow.create.for_node(
    body_node_class=Workflow.create.standard.Add,
    iter_on=("other",),
    obj=1,
    other=[1, 2, 4],
    output_as_dataframe=False,  # This is what's different
)
wf.second_loop = Workflow.create.for_node(
    body_node_class=Workflow.create.standard.Multiply,
    iter_on=("other",),
    obj=2,
    other=wf.first_loop.outputs.add,   # This is the nice part it enables!
    output_as_dataframe=False,
)
wf()
>>> {
...     'first_loop__other': [1, 2, 4],
...      'second_loop__other': [2, 3, 5],
...      'second_loop__mul': [4, 6, 10]
... }

Where we see the (in this case only) looped output mul as a channel of list data, and since they aren't connected to anything, the "what input got this output" other channels for both loops.

In the future I can see making the list-like output default and forcing people to ask for the dataframe.