Walk the graph only once for eagerly computed nodes

giacomoran commented 1 year ago

This is an attempt at applying @lord's ideas in How To Recalculate a Spreadsheet to the Reactively algorithm. I wanted to better understand how signal libraries relate to incremental computation solutions outside the JS ecosystem.

I'm opening this PR hoping it will be interesting to folks researching signals and incremental computation, I don't expect it to be merged. I've been able to slightly improve performance in some of the benchmarks, but complicating the code it's likely not worth it.

results

Performance results on the "dynamic" and "kairo" benchmarks. Performance improves slightly in the "kairo" benchmarks, which use register effects. Performance is mostly the same on the "dynamic" benchmark that lazily read nodes without registering effects.

Here's a brief explanation of the changes.

I add a concept of "eager" nodes. Eager nodes are identified as nodes that require immediate recomputation when their sources change. In our case they are nodes that have an effect as descendant, since effects are eager. The algorithm updates eager nodes similarly to Jane Street's incremental algorithm, while using the existing Reactively algorithm for non-eager nodes. This approach aims to optimize the performance of the library by avoiding walking the graph unnecessarily.

There are two broad ways to go about incremental computation: dirty marking and topological sorting. My impression is that most of the JS signals libraries, including Reactively, are based some variant of dirty marking. Instead, the incremental algorithm is based on the idea of topological sorting. My understanding is that it is a better fit for eager nodes.

dirty marking: when the value of an input node changes, we walk the graph from the input to every node affected by the change, marking each of them as dirty. Before reading a node, we check whether it's dirty. If it is we walk the graph in the opposing direction recomputing each node, otherwise we can immediately return the its value.
topological sorting: assuming we have a topological order for the graph, when the value of an input node changes we can walk the graph once, recomputing each affected node. The topological order guarantees that we visit a node at most once, only after its sources. We can obtain a topological order by tracking the partial height of each node, the invariant is that the partial height of a node is larger than the partial height of its sources. More informations can be found in this talk.

There are two ways to access the value of a node: by lazily reading its value or by registering an effect. Here's a small example:

const a = reactive(0);
const b = reactive(() => a() + 1);

for (let i = 0; i < 20; i++) {
  a.set(i);
}
// Lazily read the value of `b`, then log.
console.log(b.get());

const a = reactive(0);
const b = reactive(() => a() + 1);

// Eagerly log the value of `b`.
reactive(() => console.log(b()), { effect: true });
for (let i = 0; i < 20; i++) {
  a.set(i);
}

In the first case dirty marking should perfom best: we walk the graph down 20 times, and walk it up 1 time. b is recomputed only once. If we were to compute nodes according to the topological order we would compute b 20 times instead.

In the second case recomputing nodes according to the topological order should perfom best: we walk the graph down 20 times, each time recomputing b. If we were to use dirty marking we would walk the graph 2 * 20 times, down and up at each change.

My intuition is that if the graph were deeper and the computations more expensive, then we could get better performance by using one algorithm for lazy nodes and another for eager nodes. Assuming there are no major mistakes in my implementation, the benchmark results in the picture above kind of back up this intuition but are not very satisfying. My take-away is that performance is bottlenecked by how we construct the graph edges rather than the algorithm employed to walk the graph.

Some references I've found useful:

lord commented 1 year ago

Not familiar with the specifics of this library, but do you have benchmarks that show a change to some input that doesn’t actually cause any value changes to some massive graph? This will be where the topological algorithm shines best; I’d guess the biggest speedup would not come from reducing the walks from 2 to 1, but from skipping walking a node altogether.

milomg commented 1 year ago

@lord That's an interesting group of benchmarks that I don't think are currently tested for. I'll look into creating them!

@giacomoran Wow, the implementation of a topological sort is simpler than than I was expecting, thanks for sharing this!

I'm curious if either of you have already come across research for an idea I've been toying with for Solid&Reactively: async transactions. The way we're planning for this to work is to entirely fork the graph, modify the forked copy and if any reactive nodes return a promise, wait for the promises in the transaction to complete before merging the graph back in. This is similar in concept to a database transaction that happens on a separate thread, or how createTransition+Suspense works in Solid. I looked at some papers like Shake before Building to see if they handled async, but there isn't a lot on the idea of dealing with consistency of multiple transactions.

lord commented 1 year ago

What is the purpose of forking, to allow users to access prior values, and to allow transactions to rollback? I haven't looked at this stuff in a few years, but that sounds reasonable to me. Although if multiple transactions can exist concurrently, I wonder if that could get extremely complicated very quickly

milomg commented 1 year ago

Yes, multiple transactions can occur concurrently, and the goal is to reduce recalculation of the async reactive nodes to the minimal amount (those that must be calculated if all the root nodes had been set, and then async nodes had run), which may include cancelling promises when two transactions notice a conflict.

Backing up a bit, in Solid, the idea of a transition/transaction is to be able to model things such as routing using reactive nodes, and allow contents on the current route to change while data is being fetched on the next route. For example, if you have a counter component on /, and a JSON fetch on /feed, when you click to navigate to /feed you want to see a progress bar and still be able to edit the counter. Then, once the data has been fetched you want to commit to navigating to /feed and destroy the counter. (The api for transactions in theory also allows for delayed execution of effects, or storing multiple copies of the graph for rollbacks like you mentioned).

giacomoran commented 1 year ago

@lord Interesting, yes! By walking the graph down just once, you can stop early and avoid most of the walk altogether.

@modderme123 I'm curious about your approach. I'm not too familiar with transactions, I have a few questions. Could you implement transactions the way you are describing for a fully synchronous graph? For instance by rolling back all the changes when the recomputation hits a special node. If so, it could be similar to what signia does, not sure if they support async nodes. In your example, what is the main advantage of using a transaction? Over, for instance, starting the fetch in a effect and then writing the JSON data to a signal?

milomg commented 1 year ago

In your example, what is the main advantage of using a transaction? Over, for instance, starting the fetch in a effect and then writing the JSON data to a signal?

Continuing on the router example, one of the advantages is it allows you to declare asynchronous computations that aren't directly tied to what you are manipulating. So for example the custom link tags can just onClick create a transaction and navigate (and in fact we do this in solid-router), and then data can be fetched automatically in the new route (based on the location signal), while keeping the current page the same (effects that read the location signal won't be run until the transaction has finished loading data).

While this could be implemented without the reactive algorithm understanding concurrency and having an effect/promise write to a signal, I believe there is power in the inversion of control enabled by having computations declare their own async dependencies instead of event listeners declaring the async dependencies.

If so, it could be similar to what signia does, not sure if they support async nodes.

Yeah, Signia doesn't have async nodes, but in other ways it is very much related to Signia's transactions, except by waiting for promises before evaluating the transaction effects we lose a lot of guarantees about the original main graph changing (which will queue effects that we actually are supposed to run).

milomg commented 1 year ago

I've been thinking about this depth based marking scheme recently and discovered an interesting behavior that I couldn't seem to fix (I believe this isn't a problem for incremental because bind nodes provide a different API):

test("dynamically changes dependencies", () => {
  const s = reactive(false);
  const s2 = reactive(2);
  let callCount = 0;
  const a = reactive(() => s2.get());
  const b = reactive(() => {
    callCount++;
    return s2.get();
  });
  const c = reactive(() => (s.get() ? a.get() : b.get()), { effect: true });
  stabilize();
  expect(callCount).toBe(1);
  s.set(true);
  s2.set(3);
  stabilize();
  expect(callCount).toBe(1);
});

graph TD
  S((S)) ---> C
  A --> C
  B -.-> C

graph TD
  S((S)) ---> C
  A -.-> C
  B --> C

giacomoran commented 1 year ago

Current behavior

The node's sources are updated in this.update(). After calling s.set(true), the algorithm still thinks that the s2 node is eager. On s2.set(3), it therefore propagates the update to the b node. If you call stabilize() between s.set(true) and s2.set(3), b would not be recomputed unnecessarily.

Possible solution

I took a brief look at incremental's bind, you are right, bind does more!

I've briefly reviewed the algorithm and looked at incremental's bind, so take my comment with a grain of salt.

Incrementals allows "marking" certain nodes as dynamic, using bind. In your example, node c is dynamic. Node s is in the left-hand side of the dynamic node while a and b are in the right-hand side.

Incremental forces the left-hand side of dynamic nodes to have smaller height than the right-hand side. Therefore, in your example where s and s2 are updated at the same time, c would be recomputed before b. At this point, the graph can be walked upstream from c, visiting a instead of b.

The whole right-hand side is recomputed, as if we were running c for the first time.

Ref

milomg / reactively

Walk the graph only once for eagerly computed nodes #13

Current behavior

Possible solution