tc39 / proposal-signals

A proposal to add signals to JavaScript.
MIT License
3.3k stars 56 forks source link

Document motivation for dynamic dependency tracking #9

Open benlesh opened 11 months ago

benlesh commented 11 months ago

From the README:

For example, if the computed signal has a conditional in it, and only depends on a particular other signal during one of the branches, then if that branch is not hit, a dependency is not recorded, and an update to that signal does not trigger recalculation.

So if someone has the following code, are we going to meet their expectations?

const a = new Signal.State(100);
const b = new Signal.State(200);

const c = new Signal.Computed(() => {
  if (Math.random() > 0.5) {
    return a();
  }
  return b();
})

Either branch is potentially unreachable.

So I have questions:

  1. Are we saying that they need to be able to sort of execute the code in their head and realize what dependencies may or may not trigger the notification of the computed property?
  2. Are the dependencies added when a subsequent run hits them?
  3. Are they removed if they don't get hit?
  4. Should there be a language feature required to read a signal that must be at the beginning of a function? (This would force all dependencies to be read at the top). Or maybe even a language feature for computed signals or signals in general?

There's not really a precedent for this sort of thing in the language. There's definitely various precedents in frameworks. In particular the useEffect deps array from React comes to mind, and honestly I've never been bitten by stale references so much in my life as when I switched to React hooks (prior to there being lint rules to help with this).

NullVoxPopuli commented 11 months ago

Are we saying that they need to be able to sort of execute the code in their head and realize what dependencies may or may not trigger the notification of the computed property?

imo, yes -- but I think this can be solved with documentation and debugging tools -- in userland we could try at a sophisticated lint -- might require aggressive static analysis if the signals come from "far away" (maybe TS can help here?)

The specific example may be a little too narrow, but generally (and you probably know this, but I'm stating it to provide my background with this behavior, and maybe so that if folks are unfamiliar, they have the context, to (I'm also "thinking out loud" to myself)), the way it works in real apps is that you don't actually want to run b() until the condition is false, and the condition is usually based on other signals. so when the condition-signal becomes false, b runs, and then the overall Computed entangles with both whatever dependencies are detected while evaluating b().

If someone is expecting b()'s signals to be entangled, we'll need to be sure debugging tools show that only a()s signals have been entangled in the case where the condition is true.

OR maybe we can disallow "uncertainty" in "Computeds"? Maybe that'd eliminate any potential confusion altogether?

Are the dependencies added when a subsequent run hits them?

ye :tada: (at least, this is how Starbeam, and Glimmer work, and would be what I expect of any "auto-tracking" system that doesn't require dependency arrays (what ember/glimmer moved away from))

Are they removed if they don't get hit?

can't remove what wasn't added?

This would force all dependencies to be read at the top).

I don't think I'd want this, as there are scenarios where you want the lazy and deferred evaluation / entangling.

Is it possible for the language to to only disallow randomness in conditions in computeds? what other sorts of things would make conditions hard to "make stable"? would this be too hard of a task?

benlesh commented 11 months ago

can't remove what wasn't added?

Oh. Sorry. I meant, if in the example above a is hit the first run, but not the second, do we remove it from the dependencies list? I think no. But it's so weird that I'm not sure.

EisenbergEffect commented 11 months ago

The idea we've been discussing, and what I think most signal implementations do, is that each time the computed is evaluated, it completely refreshes its list of dependencies. In your example, you would never have both a and b. You would always have one or the other.

NullVoxPopuli commented 11 months ago

Is that for memory efficiency? Or are there other reasons?

littledan commented 11 months ago

Yes I was imagining that the set of dependencies is just the set seen on the most recent execution (and initially there are no dependencies but the signal is marked dirty).

shaylew commented 11 months ago

I think the mental model we want to give here is:

The thing that makes this scenario feel weird is that the computed depends on something not tracked by the reactive system at all: rerunning it can cause it to return a different value even if no signal changed. Spreadsheet systems call functions that can change their outputs even when the inputs don't change "volatile", and they cause all sorts of pain. (Math.random() is the canonical example of a volatile function.)

Is it possible for the language to to only disallow randomness in conditions in computeds? what other sorts of things would make conditions hard to "make stable"? would this be too hard of a task?

Don't think there's any reasonable way to disallow things like this. It's not just randomness; reading any untracked (non-signal) mutable state is effectively a volatile operation, and that includes accessing any property of a non-frozen JS object.

Linters could try to catch some of these (Math.random(), Date.now(), etc). If there was a way to detect whether or not you're currently running in a tracked context, dev-mode libraries might also be able to shim some of the worst offenders to throw or emit a loud warning if they were called from a computed. Hard to say whether that's worth it, but either way I don't think there's much we (or implementers) could do to help.

pzuraq commented 11 months ago

I think it's worth noting that this scenario arises in any dependency tracking system that has a dynamic graph. For instance, you can produce a similar issue using higher-order Observables in Rx.js.

const a = new Observable((subscriber) => {
  subscriber.next(100);
});

const b = new Observable((subscriber) => {
  subscriber.next(200);
});

const c = new Observable(() => {
  subscriber.next(300);
});

c.pipe(
  map(() => {
    if (Math.random() > 0.5) {
      return a;
    }
    return b;
  }),
  concatAll(),
);

Dynamic dependency graphs are generally necessary in order to make the graph more efficient. They allow you to skip steps if the dependencies of that step have not changed. But, combined with a language like JS that allows you to easily reach state outside of the dependency graph, it's easily possible to introduce inconsistency into the graph where runs will produce different output even if none of the inputs have changed.

So, I'd argue the issue is less about potential untracked dependencies causing dynamic graphs and execution, as that can exist in both explicit and implicit systems. It's more about the ergonomics, and if one leads to more or less accidental usage of untracked state.

The pros of explicit tracking are:

That's basically the only benefit I can think of (feel free to add more, I could definitely be missing some). The cons, however, are pretty massive:

I think the ecosystem has been gravitating toward autotracking dependencies because it solves these issues:

We were nervous about adding this system to Ember.js at first and debated it for quite a long time with essentially the same concerns that @benlesh has noted here. But after seeing it proven out by MobX and Vue, we went for it. We found that code quality, understandability, and performance increased across the board, and counterintuitively, dependencies were easier to understand and track through the system.

Last couple of notes:

There's not really a precedent for this sort of thing in the language.

I would argue that Garbage Collection is very similar to this - rather than explicitly tracking ownership of heap allocations like in Rust or C, we track them implicitly via usage and if something gets disconnected from the graph, it is cleaned up. Garbage collection does introduce some complexity - you can avoid thinking about it, until you have to think about it, and then all of a sudden you have to reason about a lot of things in the system. But the rules are relatively simple, and it's possible to write safe libraries and frameworks that don't leak memory, and to debug issues when they arise.

Some would argue that the overhead is not worth the cost, and we're just stuck with it because that's how JS was built in the first place, but I also know many senior developers who absolutely prefer working in a GC'd language and not worrying about those details most of the time. I also have found that autotracking dependencies for Signal systems is actually easier to understand and debug than GC, since you can't have cycles (you get "max callstack size exceeded" in most impls since they're "just functions", and I think we absolutely should have that here) and you have much more explicit annotations in general than with memory management.

In particular the useEffect deps array from React comes to mind, and honestly I've never been bitten by stale references so much in my life as when I switched to React hooks (prior to there being lint rules to help with this).

This is actually an example of explicit dependencies gone wrong! In the React system, you're responsible for listing out and managing those dependencies, and you must absorb their changes and staleness of object references, etc. In Signals systems that's not possible, because State changes decide if they should update their Computeds when they are updated. You can have the State do a deep equals or a shallow equals on the value, and you are always looking at the most recent value for each Signal externally. Changes don't propagate unless a State or Computed says they do, and if they say they do, they always propagate. No gotchas there.

littledan commented 11 months ago

OK, to answer the original question: I think we can conclude that the rough consensus in this thread is to keep going with auto tracking. There's some very good writing in this thread, and it'd be great if it could be moved to the README. Could someone make a PR for this, and then we can close this issue once the resolution is documented?