[minor] Factories instead of meta nodes, and use them for transformer nodes

liamhuber commented 2 months ago

So the issue is that I want IO to be well-defined at the class level, and I want to be able to dynamically create nodes, and I want it all to be pickle-able. These things are important for ideas like a "for-loop", where we want to be able to make new nodes on-the-fly for parsing different sizes of input, or for converting dataclasses into nodes (cf. #268).

What's here is still super rough -- it's too verbose and has too much unnecessary misdirection -- but the basic idea is to make some child of StaticNode that leverages different class attributes to construct it's IO pattern (and implements whatever functionality you want for the node), then on top of that to have a constructor function that dynamically makes subclasses of this node with different class attribute set.

This directly accomplishes the first two goals, and plays nicely with pickle's __reduce__ dunder, which lets you specify a custom reconstructor. (Although h5io still hard fails for custom reconstructors, so these nodes won't be storable in that paradigm.)

The first case of these is simple transformer nodes, like converting a bunch of input to a list or vice-versa. From the outside this looks like the old to- and from-list nodes, which were constructed by defining code-as-string and then exec'ing it, but now it's much more cleanly done.

Example) a simple auto-encoder

import pickle

from pyiron_workflow import Workflow

wf = Workflow("tmp")
n = 3
wf.inp = Workflow.create.meta.to_list_node(n, 1, 2, 3)
wf.out = Workflow.create.meta.from_list_node(n, wf.inp)
out = wf()
print(out)

reloaded = pickle.loads(pickle.dumps(wf))
print(reloaded.outputs.to_value_dict())
print(reloaded.inp.__class__.__name__)
print(reloaded.inp.outputs.list)
>>> {'out__item_0': 1, 'out__item_1': 2, 'out__item_2': 3}
>>> {'out__item_0': 1, 'out__item_1': 2, 'out__item_2': 3}
>>> InputsToList_length3
>>> [1, 2, 3]

Note how the entire workflow is pickleable! I want to extend the idea of meta-nodes to Function and Macro instances (i.e. those not created via decorators, which tend to be directly importable anyhow -- Ok, let's do an aside

@Workflow.wrap.as_function_node("x")
def Foo(x):
    return x

foo = Foo(5)
foo()

reloaded = pickle.loads(pickle.dumps(foo))
reloaded.outputs.to_value_dict()
>>> {'x': 5}

Works totally fine these days, but

def bar(x):
    return x

bar = Workflow.create.function_node(bar, output_labels="x", x=5)
bar()

reloaded = pickle.loads(pickle.dumps(bar))
reloaded.outputs.to_value_dict()

Gives a pickling error, PicklingError: Can't pickle <class '__main__.foo_function'>: it's not the same object as __main__.foo_function

/aside )

I'm playing around with __new__ and an as_meta_node decorator to try and get a cleaner abstraction, but it's not there yet.

UPDATE:

I'm still not super satisfied with the user-facing syntax, which remains more verbose than I'd like, but I'm getting happier with the abstraction.

I now introduce two new features: snippets.singleton.registered_factory, which lets your class factories return the same object when they would return classes with the same name, and snippets.constructed.Constructed and snippets.constructed.mix_and_construct_instance which are a mix-in class and wrapper function, respectively, for making dynamically created classes pickleable (via __reduce__, just like earlier comments).

I am quite convinced that there's a way to integrate these two into something more succinct and powerful, but I think I'll pause here and move on with some practical stuff. They are already super useful for the classes converting input channels to a list and lists to output channels. The only thing that totally drives me crazy right now is using stuff as a decorator can get ugly, since then python can't find the underlying factory function to import.

Anyhow, here is a working example using the Constructed mixin:

from abc import ABC
import pickle

from pyiron_workflow.snippets.singleton import registered_factory
from pyiron_workflow.snippets.constructed import Constructed

class Foo(Constructed, ABC):
    def __init_subclass__(cls, /, n=0, **kwargs):
        super().__init_subclass__(**kwargs)
        cls.n = n

    def __init__(self, x):
        self.x = x

def constructed_foo_factory(n):
    return type(
        f"{Foo.__name__}{n}",
        (Foo,),
        {},
        n=n,
        class_factory=constructed_foo_factory,
        class_factory_args=(n,),
        class_instance_args=(0,),
    )

registered_foo_factory = registered_factory(constructed_foo_factory)

FooTwo = registered_foo_factory(2)
FooToo = registered_foo_factory(2)
assert(FooTwo is FooToo)

f = FooTwo(42)

s = pickle.dumps(f)
reloaded = pickle.loads(s)
assert(f.n == reloaded.n)
assert(f.x == reloaded.x)

And using the wrapper for classes that don't inherit from Constructed to achieve the same result:

from abc import ABC
import pickle

from pyiron_workflow.snippets.singleton import registered_factory
from pyiron_workflow.snippets.constructed import mix_and_construct_instance

class Foo(ABC):
    def __init_subclass__(cls, /, n=0, **kwargs):
        super().__init_subclass__(**kwargs)
        cls.n = n

    def __init__(self, x):
        self.x = x

def foo_factory(n):
    return type(
        f"{Foo.__name__}{n}",
        (Foo,),
        {},
        n=n,
    )

registered_foo_factory = registered_factory(foo_factory)

f = mix_and_construct_instance(
    registered_foo_factory,
    (2,),  # Factory args
    {},  # Factory kwargs
    (42,),  # Instance args (overriden with getstate)
    {},  # Instance kwargs (overriden with getstate)
    {"n": 2, },  # __init_subclass__ kwargs 
    # The subclass kwargs are a duplicate of factory info...
    # This is annoying for users of `mix_and_construct_instance`, 
    # but not difficult.
)

assert(f.n == 2)
assert(f.x == 42)

reloaded = pickle.loads(pickle.dumps(f))
assert(f.n == reloaded.n)
assert(f.x == reloaded.x)

UPDATE 2:

I tried using it and didn't like it, so I came back. I'm now super, duper happy. There is both a decorator interface @classfactory, and a constructor ClassFactory that decorate functions returning a tuple in direct analogy to the one consumed by builtins.type. Resulting factories (classes) have object equivalence based on the factory function (generated class name) -- that means users are responsible for making sure their class names are non-degenerate, but IMO that is a totally fair requirement. Factories, classes, and resulting instances are all (un)pickleable, and factory-generated classes can themselves be re-used in downstream factory functions without trouble.

I'll apply these changes to the Transformer stuff, but first I want to go write a little pedagogical blog post for the upcoming meeting.

UPDATE 3:

classfactory worked exactly as hoped for the transformers. I switched from __init_subclass__ over to just defining (sometimes un-defaulted) typing.ClassVar attributes, since __init_subclass__ was just too much of a pain when one abstract class inherits from another.

I.e. this throws a super annoying type error:

class Foo:
    def __init_subclass__(cls, /, n, **kwargs):
        super().__init_subclass__(**kwargs)
        cls.n = n

class Bar(Foo):
    def __init_subclass__(cls, /, n, m, **kwargs):
        super().__init_subclass__(n=n, **kwargs)
        cls.m = m

Baz = type(
    "Baz",
    (Bar,),
    {},
    **{
        "n": 1,
        "m": 2,
    }
)
>>> TypeError: Foo.__init_subclass__() missing 1 required positional argument: 'n'

So I opted for this pattern instead:

from typing import ClassVar

class Foo:
    n: ClassVar[int]

class Bar:
    m: ClassVar[int]

Baz = type(
    "Baz",
    (Bar,),
    {
        "n": 1,
        "m": 2,
    },
)

review-notebook-app[bot] commented 2 months ago

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

github-actions[bot] commented 2 months ago

:point_left: Launch a binder notebook on branch _pyiron/pyiron_workflow/transformernodes

codacy-production[bot] commented 2 months ago

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
:white_check_mark: +0.71% (target: -1.00%)	:white_check_mark: 97.73%

Coverage variation details

| | Coverable lines | Covered lines | Coverage | | ------------- | ------------- | ------------- | ------------- | | Common ancestor commit (744ebfcbb0427a2381c0266c91bd0ef4a99e5af9) | 3501 | 3070 | 87.69% | | | Head commit (498559d1d40d02a6e5dda371005fd66b2d4726d4) | 3698 (+197) | 3269 (+199) | 88.40% (**+0.71%**) | **Coverage variation** is the difference between the coverage for the head and common ancestor commits of the pull request branch: ` - `

Diff coverage details

| | Coverable lines | Covered lines | Diff coverage | | ------------- | ------------- | ------------- | ------------- | | Pull request (#293) | 220 | 215 | **97.73%** | **Diff coverage** is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: `/ * 100%`

See your quality gate settings Change summary preferences

_{Codacy will stop sending the deprecated coverage status from June 5th, 2024. Learn more}

coveralls commented 2 months ago

Pull Request Test Coverage Report for Build 8883581121

Details

0 of 0 changed or added relevant lines in 0 files are covered.
13 unchanged lines in 2 files lost coverage.
Overall coverage increased (+0.7%) to 88.399%

Files with Coverage Reduction	New Missed Lines	%
io_preview.py	6	94.38%
create.py	7	87.88%
<!--	Total:	13	-->

Totals
Change from base Build 8883544557:	0.7%
Covered Lines:	3269
Relevant Lines:	3698

pyiron / pyiron_workflow