pyiron / pyironFlow

react xyflow for pyiron
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Next steps (18.10.2024) #7

Open Tara-Lakshmipathy opened 4 days ago

Tara-Lakshmipathy commented 4 days ago
liamhuber commented 4 days ago

I think some aspect of this should be handled inside pyiron_workflow and just linked here. E.g., I could imagine something like this belonging nicely in the backend module:

def to_macro(*nodes: StaticNode, class_name: str | None = None) -> type[Macro]:
    """
    Given a set of sibling node instances, creates a new macro class where all 
    connections internal to the passed nodes are reformed in the graph creation 
    method, and all current values for external inputs (i.e. those which do not 
    have a connection to a sibling in `*nodes`) are used as defaults for the new 
    macro IO.

    A class name can be provided explicitly, but otherwise will be constructed 
    using data from `*nodes`.

    TODO: Maybe expose a kwarg for modifying the IO interface?
    """

That is pretty rough; I could alternatively imagine returning a string of the code defining the graph creation method (or both class and raw code). It is also incomplete as what you really want to be able to do is this and replacing *nodes in the graph with an instance of the new macro on-the-fly. (Then there is the corollary step of taking a macro instance and on-the-fly replacing it with all its individual parts, i.e. "unpacking" the macro onto its parent level.)

Anyway, my main point is that it seems fair to make the backend responsible for most of this functionality.

Tara-Lakshmipathy commented 4 days ago

That's definitely an idea I can get behind! Then from the GUI side, we would just have to pass the selected nodes to def_macro.

By the way @liamhuber , is there a way for me to access information about other nodes from within a node before the workflow has run even once? For example, I want the label of a node that is connected to the current node to be used from within the current node.

Something like:


@as_function_node("something")
def CurrentNode(input1):

   if (input1.original_owner.label) == 'InterestingNode':
      variable1 = interesting_function()
   else:
      variable1 = boring

   return variable1

I mean, of course this is possible if the user sets up the output in the previous node to return something which includes the original_owner properties. But my question is, would it be a consideration for future development so that this is a default?

Or maybe something even more general and not restricted to the nodes that are connected? This was the angle I was going for when I mentioned the master data class thing in the specs.

liamhuber commented 4 days ago

You could define helper methods on nodes that are aware of its graph state (eg the wf.get_master_dataclass()), that's not a problem at all -- but the actual node operation (.run()) should not lean on graph construction this way.

Function nodes hold and process data, the node instance itself should not be used as data; similarly macros are just a graph of other nodes and so ultimately behave the same way. I'm sure you can strong-arm a set of nodes to go in this direction, but I strongly recommend against it. Having function node innards depend on parent graph state is highly non-functional and I think it will only bring headaches in the long run.

Tara-Lakshmipathy commented 4 days ago

@liamhuber Oh, I definitely do not want to run nodes from within other nodes. Nor do I want nodes to have access to generated input values from other nodes without being connected (that's what connections are there for lol). I would only like information about other nodes in the graph, and the user-defined inputs (like the number of repetitions of a unit cell that a user defined somewhere else in the workflow).

The thing is, openBIS can store datasets with a graph representation where stuff are linked together: image

I would like workflows to be represented on openBIS as close to 1:1 as possible. And following up on @JNmpi of having openBIS directly in the nodes, I would need some way for each node to know where it sits in the graph.

liamhuber commented 4 days ago

Do you mean OpenBis directly in the node, as in its run functionality exploits graph location info and talks to OpenBis, or on the node, as in there is some custom method like send_to_OpenBis or locate in OpenBis or whatever? I am still very against the former, but the later sounds fine and should be quite easy.

Tara-Lakshmipathy commented 4 days ago

Definitely the later!! I hadn't even considered the former XD

liamhuber commented 4 days ago

Ok good 😂 Without being very well informed about OpenBis, I can't make an absolute promise, but it certainly seems to me like it should be easy to access all the data it might want by adding a method to nodes. Even better might just be to write a function that takes a node (or nodes) as inputs, so we don't add any OpenBis dependency to the core infrastructure.

Tara-Lakshmipathy commented 4 days ago

No, openBIS should definitely not be a dependency unless we include a bunch of other ones like CKAN, Zenodo etc (which I'm against including in the core infra). There are other candidates for publishing data, and I don't think we should make openBIS "the" knowledge-base for our datasets.