Open JNmpi opened 1 year ago
I played a bit with a class that extends the node generator to write the python function to a repository consisting of directories and a python file. Having such a features provide the following advantages:
- The user can stay inside the notebook to add new node repositories or nodes to existing ones. Of course, if preferred users can directly add/edit such files outside jupyter using their editor of choice.
- Storing and sharing workflows requires to store not only the state of the node but also the node definition. For nodes defined in a local jupyter session, such functionality provides an easy way to store local nodes.
- This feature naturally extends our create construction.
@JNmpi, awesome! It will take me a while to go through the classes in detail, but overall this sounds A-OK to me!
Key criteria are:
- Make adding new nodes easy (via decorator or simply adding functions to a file)
I like being able to have plain-text code get serialized from inside the notebook. I have some mild concern about the "simply adding functions to a file" part, as the choice of which node decorator to apply ("standard" (i.e. "fast"), "slow", or "single-value") is not totally trivial, and I'm not sure we want to automate that part away just so that users don't need to add a single @Workflow.wrap_as....
line above their functions. But this concern is mild and I may change my tune after digging into your implementation
- Use directories to build up a class schema (e.g. workflow.create.math.numpy)
Being able to register an entire directory with multiple sub-modules of stuff to import instead of only being able to register a single depth of namespace after create
(or currently add
) sounds good to me!
- Make nodes delayed (only node.execute() executes the node
I am a bit confused how this relates to our existing paradigms and terminology. Currently we call it run()
to execute the node functionality. The old default (main:HEAD:workflow.node.Node
) is to be delayed (run_on_updates=False
, update_on_instantiation=False
), the new default (#729 workflow.function.Function
) is to aggressively re-run the node (both the flags set to True
) whenever the input changes (and is all compliant with type hints); In pyiron/pyiron_contrib#729 has a Slow(Function)
class available with the old delayed defaults.
Thanks, @Liam for your quick reply and thoughts. We can set up a Zoom meeting for one of the next days to discuss these ideas in more detail.
Regarding the last point. This is identical to the last topic in the issue 'Suggestions and issues for the workflow class pyiron/pyiron_contrib#756'. Regarding the key word I am completely open. We can keep it 'run', but I am also open to something like 'compute', 'evaluate' or 'execute'. What I like to have is to have a workflow of delayed nodes that I can execute with a single command (i.e. workflow.run() or top_node.run()) and that runs all necessary nodes.
We can set up a Zoom meeting for one of the next days to discuss these ideas in more detail.
Sure thing. I'm available Thursday and Friday >1500 CET (>1700 CET would be even better, as then I can help get the kids out the door, and to set a Thursday meeting I'll need ~12 hours heads-up so that I know to set an alarm.)
Since we also plan to discuss this on Monday, I'm also ok waiting until then.
Regarding the last point. ... What I like to have is to have a workflow of delayed nodes that I can execute with a single command (i.e. workflow.run() or top_node.run()) and that runs all necessary nodes.
Super! More detail over on pyiron/pyiron_contrib#756, but I think we're well-aligned on wanting exactly such a feature/interface available to users.
For me Today (Thursday) >17 CET or e.g. next Monday would be OK. To prepare and focus the meeting on Monday it may be good to have a short meeting today.
For me Today (Thursday) >17 CET or e.g. next Monday would be OK. To prepare and focus the meeting on Monday it may be good to have a short meeting today.
@JNmpi sounds good -- I'll be in the pyiron zoom room around 1715
Great! I will there be too.
@samwaseda, here is the existing thoughts on what we were talking about. Some of it is a bit outdated (e.g. there are no longer fast/slow/single value nodes -- this has all been unified), but other parts are relevant.
I guess what we had in mind may be narrower in scope, i.e. something like
from pyiron_workflow import Workflow
import math
@Workflow.wrap.as_function_node("y")
def Foo(n: int, k: int):
return math.perm(n, k)
Foo.to_py_file("scratch")
Populating scratch.py
with something along the lines of
from pyiron_workflow import Workflow
@Workflow.wrap.as_function_node("y")
def Foo(n: int, k: int):
import math
return math.perm(n, k)
Per our conversation, I've assigned you here, but this issue is quite old and I think we can modify the scope of it as we go.
Skimming over this and thinking a bit, I wonder how the export method would handle things if a file already exists? Can it cleverly merge in nodes of the same name by overwriting and simply append nodes of a new name? One of the tricky bits we anticipated was dependency management; if we get merging running how will the merge handle the dependencies changing, e.g. the old version required from foo import bar
but the new version does not? Can(should?) we leverage AI for any of this? This sort of "extract and modify very slightly" is something I've found GPT to be passably good at.
Skimming over this and thinking a bit, I wonder how the export method would handle things if a file already exists? Can it cleverly merge in nodes of the same name by overwriting and simply append nodes of a new name? One of the tricky bits we anticipated was dependency management; if we get merging running how will the merge handle the dependencies changing, e.g. the old version required
from foo import bar
but the new version does not? Can(should?) we leverage AI for any of this? This sort of "extract and modify very slightly" is something I've found GPT to be passably good at.
Can’t we simply set something like overwrite = True
? My biggest problem right now is that I don’t really know how we can automatically detect math and write
import math` in the file, but whether the line can be correctly exported or we can only see that there are undefined variables, I don’t think it’s a deal breaker to overwrite or not do anything at all when the file already exists. Or did I miss something crucial here maybe?
Finding the necessary dependencies is definitely the more serious issue -- that's rather a deal breaker. Although to get the ball rolling we could do something like say all imports need to happen inside the node definition and if they don't things will break and it's your fault.
Overwriting is fine, but unless we want each node to be completely alone in its own file, we need some way of merging content into an existing node -- at which point we need to be able to tell if we're overriding old content or appending new content. Unlike determining imports, I don't see any fundamental technical barrier here or some missing knowledge, but it's still a matter of doing the legwork to get a parser running that knows (at a minimum) how to isolate and replace the decorated function definition when a new one with the same name is provided.
I played a bit with a class that extends the node generator to write the python function to a repository consisting of directories and a python file. Having such a features provide the following advantages:
A simple application of these classes and their functionality is given below:
Application of the node-based workflow class
Objective: Create a module that looks and feels like pyiron but is based on a nodes. Key criteria are:
Create an example node
Note: The decorator is a convinience function. You can directly define the node function in the corresponding python file.
Construct an example workflow
Since I cannot store .py files in the text I append the code below:
EDIT: github syntax highlighting