Thoughts on aborting and/or restarting calculations?

(this is less of an issue and more of a discussion question around the Engine, apologies it's a bit long)

I've been wondering if the Engine can/should be changed to abort and/or restart a calculation if a new one is requested while a previous one is in progress. I ended up running into a small issue with the Engine's calculateOnChange behavior due to some event ordering triggering an onChange(false) immediately before an onChange(true), in which case the onChange(true) essentially gets dropped (and the node ordering does not get recalculated) because the calculation started by the onChange(false) was still in progress. I was able to work around that by making some of my code asynchronous to ensure the onChange(true) fires before the onChange(false) (see https://github.com/skipscooters/mapgraphs/blob/e4795f877b250f5facb4ffc1fb92a1a58226a12d/app/src/nodes/DynamicInputsNodeBase.ts#L42 for the actual implementation if you're curious), but that's what initially prompted my thinking about how concurrent calculation requests are/should be handled by an Engine.

The second related concern I ran into was with calculations that take a while. Since calculate is async, it's easy to have instances where the user is changing the graph while a calculation is in progress, and right now that doesn't play too well with calculateOnChange (since a change during a calculation doesn't trigger a new calculation, it's just dropped).

So one idea I had would be to make the Engine restart a new calculation once the first one completes if a change occurred during that calculation (and potentially looping repeatedly if there continue to be changes during the restarted calculations). Then a potential improvement on top of that could be to allow calculations to be aborted, so that if the graph is changed during a calculation the Engine doesn't need to finish calculating all the nodes before starting a new calculation (this has some potential issues if node calculations aren't pure functions, but there are always going to be potential race conditions as long as graph changes are allowed to happen concurrently with calculations, so the calculations kind of need to be pure functions regardless).

One other topic I was also thinking about was the fact that Engine.calculate doesn't have any mutual exclusion (except for internal onChange-triggered calculations), which could be problematic if multiple long-running calculations are triggered concurrently with different data, since the intermediate calculation results are stored on the shared graph (nodes and interface values) during a calculation. You could imagine a case where a node's calculate method is async and takes a nondeterministic amount of time (e.g. making a network request), which could result in a second concurrent calculate() actually "passing" the first as it makes its way through the graph, resulting in potentially inconsistent/corrupt data throughout the graph. I don't really see a good way to prevent this other than mutual exclusion, unless each calculation operated on a cloned graph or something.

I'm happy to jump in and try making some of these changes to see how they work out, but wanted to check if you already had any thoughts on the topic first.

This is a good point. Honestly, when I designed the engine plugin, I never had long running calculations in mind. The ability for the calculate function of a node to return a promise was just added, because it was pretty easy to implement :wink: But of course, I forgot about the implications.

I like your idea of "remembering" when changes happen during calculations and starting a new calculation after the running one has finished. Also, cancelling running calculations is a pretty nice feature. I think, both of these can be implemented without too much effort. I can do this, but I am currently very limited on time unfortunately, so this might take a bit until I have time to do (and properly test) this. But of course, if you have time, I very much appreciate a PR.

In general, though, I think the Engine plugin (and pretty much the whole current implementation of Baklava) is deeply flawed in its concept. IMO, dataflow programming is a different/graphical way of functional programming. But the current implementation heavily relies on side-effects in pretty much every aspect (as you already noticed, you can't write pure calculation functions at the moment). I stumbled across this, when I tried to run my calculations in a web worker.

Currently, I am thinking a lot about what Baklava v2 could look like. The general idea is to make a Baklava graph like a compute graph. So for example (connected) input interfaces would not have a value anymore, since this value is being generated during the calculation. This would solve a lot of the problems with concurrency and separate memory in e. g. web workers. So a calculate function would receive everything it needs (meaning all inputs and option values) as parameters and return all the output values. This means, it does not need side effects or depends on the actual graph instance. Of course, this limits things like setting option values or performing other side effects on the graph, but some of that can be worked around and in general, there are clearly more benefits to this approach than there are limitations. Unfortunately, I don't think that Baklava v2 will be finished this year, because I want to pretty much rewrite everything and currently I only have plans and have not implemented anything yet.

Thanks for the super quick response!

I actually just moved my computation to a web workers last night, and I created an abstract "MemoizedNode" that has an abstract internalCalculate function for subclasses to implement that receives inputs as params and applies the returned value to the output (I'll probably add support for options as params and multiple outputs soon) -- it's been a really nice api for implementing nodes and made it easy to avoid unnecessary recalculations, so I agree it feels like a good direction for a v2.

If you're curious, I ended up starting a rewrite of an Engine plugin that handles concurrency and long-running calculations nicely - https://github.com/skipscooters/mapgraphs/commit/59907e89c18815a54ea96e9ee815e896b9c921bb

I spent some time studying rete.js's Engine as well, and while they do support concurrent node calculations out of the box, their implementation also had some aspects I wasn't a huge fan of (one of the big ones being that they have per-node mutexes to deal with potential concurrent access during calculations), so I ended up writing my own calculation engine design.

There's still a lot to be done on my Engine, and some things are definitely more awkward than I'd like, due to a self-imposed constraint of not modifying any baklavajs libraries for now, but from a little bit of testing it seems to work nicely so far with some dummy DAGs.

The 2 big changes in calculation compared to the stock Engine are state management and execution ordering:

the first step of a calculation is to gather all of the state (DAG configuration and "user input" via Options) from the editor (synchronously, so it's impossible for it to change underneath us during this phase) into a State data structure that is independent from the editor (see generateCalculationState). Once we've gathered that state synchronously, it's fine to start awaiting the async calculations and it no longer matters if the editor state changes at this point since we're operating on an independent copy of the data. It also means you can safely cancel/restart a calculation with no risk of any strange/inconsistent data, since no editor state is modified or shared until all calculations have completed (at which point a synchronous operation applies the results to the editor atomically, but only if the calculation wasn't already canceled/superseded)
During that initial state-gathering time, I also identify all nodes with zero unsatisfied input interfaces (i.e. in-degree of zero) which become the initial nodes of the calculation phase. The meat of the calculation is essentially a standard topological sort, but importantly, node calculations are initiated as part of the topological sort algorithm, rather than picking a single topological-ordering upfront and then awaiting one node at a time. Each time we visit a node with in-degree zero (see processNode) we calculate its outputs, identify all outgoing edges, "remove" them from the graph, and repeat the process with any nodes that now have an in-degree of zero after those edges were removed. This was implemented carefully so that all nodes with satisfied input dependencies can start executing immediately, meaning a bunch of async node calculations can happen concurrently if there's a large fan-out in the graph, and therefore the whole process can complete much faster.

Thanks, I'll check as soon as I have more time and integrate that into the Engine plugin.

newcat / baklavajs

Thoughts on aborting and/or restarting calculations? #55