Discussion: Loops / Iterations / Ranges

ThePixelDeveloper commented 3 years ago

Morning 👋🏼

I'm opening this to start a discussion on working with loops, iterations and ranges. I'm trying to work on something like Node Red's Loops

CleanShot 2021-08-24 at 15 10 03@2x

or the way Unreal Engine Blueprint does it:

CleanShot 2021-08-24 at 15 15 16@2x

but struggling to come up with a solution for it within Baklava.

The way I've currently implemented it, is to make each node to the right of the Iterate aware that it's dealing with an iterable value and treat it differently, but that feels like a hack.

CleanShot 2021-08-24 at 15 18 11@2x

I was wondering if any thought has been made into how to deal with this? I'm thinking I need to come up with a different way of executing the graph.

newcat commented 3 years ago

Yes indeed, the Baklava Engine was never designed to handle loops. I guess the best way to implement them is to create your own Engine. You can have a look at the original Engine implementation and add the necessary parts for loops.

In Baklava v2 loops won't be supported out of the box either but they will be a lot easier to implement.

ThePixelDeveloper commented 3 years ago

@newcat Out of interest, do you know what algorithms I need to look at to make this work? Self admittedly I lack a bit of knowledge in this domain. I notice PyFlow allows you to call the outputInterfaces as methods, ie ...

    def compute(self, *args, **kwargs):
        ls = self.array.getData()
        if len(ls) == 0:
            self.completed.call(*args, **kwargs)
        else:
            for i in ls:
                self.elem.setData(i)
                push(self.elem)
                self.loopBody.call(*args, **kwargs)
            self.completed.call(*args, **kwargs)

Source

newcat commented 3 years ago

I did some more research and this problem is indeed more complex than I initially expected.

PyFlow & Unreal

Both, PyFlow and Unreal, use an explicit execution order defined by special inputs and outputs on the nodes to achieve this. This approach, however, poses another difficulty.

Let's take the image from the Unreal Blueprint Engine you posted as an example:

The graph execution starts at the OnActorBeginOverlap node, triggered by some external event
The execution flow goes on to the ForLoop node
The execution flow goes to the Print String node
To execute, the Print String node requests the value of "In String". This value is dependent on the Append Node
The Append Node in turn is dependent on the value of both the Iteration Prefix node as well as the unnamed node below (I'll just call it the String converter node, not sure what it does)
The Iteration Prefix node has no dependencies, so it's output value is passed on to the input interface "A" of the Append Node
The String converter node on the other hand is dependent on the ForLoop node. However, as the ForLoop node has already been executed, the value of "Index" is taken as the input value to the String converter node. It executes and passes the output value to the Append Node
The Append Node now has all the values it needs, so it executes and passes the output value to the "In String" interface of the Print String node
The Print String node now has all the values it needs, so it executes.
As the output execution interface of the Print String node is not connected, it marks the end of the loop body.

The parts 3 - 10 will be repeated as many times as defined in the ForLoop node. Afterwards, the execution flow goes to the node connected to the "Completed" interface of the ForLoop node (which is not connected in the example, so the graph execution will end there).

As you can see, there is quite a bit going on. The challenge is that the approach combines two ways to execute a graph:

The execution flow is going "forward" from node to node, meaning that after one node has been executed, it passes its data to the connected nodes and then the control flow goes to the next node via the special execution connections
Dependency graph: The value of "In String" is dependent on the Append node, which is in turn dependent on other nodes and so on. Essentially, the algorithm starts at the output end of a graph (in case of e. g. Kahn's algorithm all nodes that have no outgoing connections) and finds all nodes that the starting node(s) is/are dependent on. These nodes can also have dependencies and so on. So it is basically the reverse way. The approach is called topological sorting.

There are a lot of edge cases with this approach and to be honest, I don't know how Unity resolves some of them. Let's take this graph as an example (please excuse my bad photo editing skills and also imagine the two triggers were different):

impossible_graph

What would happen with the bottom trigger? It would execute the bottom Print String node, which is dependent on the String converter node which is in turn dependent on the ForLoop node. However, the ForLoop node has not been executed and never will be - so what value should "Index" have?

I haven't really looked to much into these kinds of graphs yet, so maybe there is an easy way to resolve this issue that I don't know. I just want to warn you about these cases that need to be handled :)

NodeRED

NodeRED uses a similar approach as the ones mentioned above, but in a simplified way that avoids the edge cases. It essentially combines the execution flow and the data flow into a single flow. This is done by using a single message, that is transferred from node to node. The message contains all the output values of a node.

Unfortunately, this also means that nodes can only have a single input interface. This can be very limiting in a lot of cases.

Here is how the graph example graph that you posted is being executed in NodeRED:

The Loop node receives the message.
The message has no index value, so the Loop node attaches the starting index to the message. The index on the message can also be used by subsequent nodes.
The message is passed to the node(s) connected to the bottom output of the Loop node. The node(s) are executed
The last node in the loop body returns the message back to the Loop node
The loop increases the index it got from the message. If the loop condition is still fulfilled, we jump to step 3
Otherwise, the index is stripped from the message and the message is passed to the node connected to the top output of the Loop node.

As you can see, the implementation is a lot easier and there are not as many edge cases to worry about. On the downside, this approach is very limiting and everything needs to be built around having this one message that is passed around.

Summary

Okay so I wrote a lot of text, but these are the main takeaways:

Loops in dataflow programming are hard, especially when using graphical input methods
There are multiple approaches for this - each with pros and cons.

In Baklava v2, things will get a lot easier because it will have subgraphs. So you could implement the loop body as a subgraph, which is similar to approach 1 but without the edge cases. Also, Baklava v2 will have two different engines:

Dependency/backward based, as the engine in Baklava v1 already is
Forward, similar to the approaches above

Unfortunately, I doubt that v2 will be officially released in this year.

So overall probably not what you wanted to hear - essentially there is no simple solution to implement loops in Baklava v1. You'd pretty much have to implement your own engine that is using one of the approaches mentioned above.

By the way: Huge thanks for the sponsorship! I really appreciate it!

ThePixelDeveloper commented 3 years ago

Thanks for the reply @newcat; I appreciate the time and effort you put into it and it was the reason I sent you some monies. 💰

I've been mulling it over and I'll be trying the approach of Unreal and Unity. They make a distinction which make sense to my monkey brain, and that's execution flow and data are separate concepts. It's easier to see in Unity here:

Source

As you rightly pointed out, green arrows (execution) flow left to right and the data is gathered for the current node, from right to left. It's the reason I've been struggling to conceptualise it within Baklava JS.

So taking their for each loop example:

There's some magic happening under the hood where the string nodes know they're operating on the element of the collection, but that I'm not too worried about.

What would happen with the bottom trigger? It would execute the bottom Print String node, which is dependent on the String converter node which is in turn dependent on the ForLoop node. However, the ForLoop node has not been executed and never will be - so what value should "Index" have?

I don't have a copy of Unity / Unreal (maybe I should), but I have a gut feeling that's not possible.

I'll let you know if I end up working on my own engine to support such flows.

newcat / baklavajs