Document motivation for dynamic dependency tracking

From the README:

For example, if the computed signal has a conditional in it, and only depends on a particular other signal during one of the branches, then if that branch is not hit, a dependency is not recorded, and an update to that signal does not trigger recalculation.

So if someone has the following code, are we going to meet their expectations?

const a = new Signal.State(100);
const b = new Signal.State(200);

const c = new Signal.Computed(() => {
  if (Math.random() > 0.5) {
    return a();
  }
  return b();
})

Either branch is potentially unreachable.

So I have questions:

Are we saying that they need to be able to sort of execute the code in their head and realize what dependencies may or may not trigger the notification of the computed property?
Are the dependencies added when a subsequent run hits them?
Are they removed if they don't get hit?
Should there be a language feature required to read a signal that must be at the beginning of a function? (This would force all dependencies to be read at the top). Or maybe even a language feature for computed signals or signals in general?

There's not really a precedent for this sort of thing in the language. There's definitely various precedents in frameworks. In particular the useEffect deps array from React comes to mind, and honestly I've never been bitten by stale references so much in my life as when I switched to React hooks (prior to there being lint rules to help with this).

Are we saying that they need to be able to sort of execute the code in their head and realize what dependencies may or may not trigger the notification of the computed property?

imo, yes -- but I think this can be solved with documentation and debugging tools -- in userland we could try at a sophisticated lint -- might require aggressive static analysis if the signals come from "far away" (maybe TS can help here?)

The specific example may be a little too narrow, but generally (and you probably know this, but I'm stating it to provide my background with this behavior, and maybe so that if folks are unfamiliar, they have the context, to (I'm also "thinking out loud" to myself)), the way it works in real apps is that you don't actually want to run b() until the condition is false, and the condition is usually based on other signals. so when the condition-signal becomes false, b runs, and then the overall Computed entangles with both whatever dependencies are detected while evaluating b().

If someone is expecting b()'s signals to be entangled, we'll need to be sure debugging tools show that only a()s signals have been entangled in the case where the condition is true.

OR maybe we can disallow "uncertainty" in "Computeds"? Maybe that'd eliminate any potential confusion altogether?

Are the dependencies added when a subsequent run hits them?

ye :tada: (at least, this is how Starbeam, and Glimmer work, and would be what I expect of any "auto-tracking" system that doesn't require dependency arrays (what ember/glimmer moved away from))

Are they removed if they don't get hit?

can't remove what wasn't added?

This would force all dependencies to be read at the top).

I don't think I'd want this, as there are scenarios where you want the lazy and deferred evaluation / entangling.

Is it possible for the language to to only disallow randomness in conditions in computeds? what other sorts of things would make conditions hard to "make stable"? would this be too hard of a task?

can't remove what wasn't added?

Oh. Sorry. I meant, if in the example above a is hit the first run, but not the second, do we remove it from the dependencies list? I think no. But it's so weird that I'm not sure.

The idea we've been discussing, and what I think most signal implementations do, is that each time the computed is evaluated, it completely refreshes its list of dependencies. In your example, you would never have both a and b. You would always have one or the other.

Is that for memory efficiency? Or are there other reasons?

Yes I was imagining that the set of dependencies is just the set seen on the most recent execution (and initially there are no dependencies but the signal is marked dirty).

I think the mental model we want to give here is:

Each time the computation runs, we keep a "trace" (the dependency set) of what signals it read. That's a function of the dynamic semantics -- you can't determine it syntactically, so in some sense we are asking people to be able to run the code in their heads to predict what it depends on. (There isn't even necessarily a finite static approximation to the set of signals it "might" read -- imagine it looks up a signal from a Map, or calls a function that might lazily create a new signal.)
Changes to signals it read cause it to be marked (because if it ran again it would observe different values in those signals). Changes to signals it didn't read don't cause it to be marked dirty.

The thing that makes this scenario feel weird is that the computed depends on something not tracked by the reactive system at all: rerunning it can cause it to return a different value even if no signal changed. Spreadsheet systems call functions that can change their outputs even when the inputs don't change "volatile", and they cause all sorts of pain. (Math.random() is the canonical example of a volatile function.)

Is it possible for the language to to only disallow randomness in conditions in computeds? what other sorts of things would make conditions hard to "make stable"? would this be too hard of a task?

Don't think there's any reasonable way to disallow things like this. It's not just randomness; reading any untracked (non-signal) mutable state is effectively a volatile operation, and that includes accessing any property of a non-frozen JS object.

Linters could try to catch some of these (Math.random(), Date.now(), etc). If there was a way to detect whether or not you're currently running in a tracked context, dev-mode libraries might also be able to shim some of the worst offenders to throw or emit a loud warning if they were called from a computed. Hard to say whether that's worth it, but either way I don't think there's much we (or implementers) could do to help.

I think it's worth noting that this scenario arises in any dependency tracking system that has a dynamic graph. For instance, you can produce a similar issue using higher-order Observables in Rx.js.

const a = new Observable((subscriber) => {
  subscriber.next(100);
});

const b = new Observable((subscriber) => {
  subscriber.next(200);
});

const c = new Observable(() => {
  subscriber.next(300);
});

c.pipe(
  map(() => {
    if (Math.random() > 0.5) {
      return a;
    }
    return b;
  }),
  concatAll(),
);

Dynamic dependency graphs are generally necessary in order to make the graph more efficient. They allow you to skip steps if the dependencies of that step have not changed. But, combined with a language like JS that allows you to easily reach state outside of the dependency graph, it's easily possible to introduce inconsistency into the graph where runs will produce different output even if none of the inputs have changed.

So, I'd argue the issue is less about potential untracked dependencies causing dynamic graphs and execution, as that can exist in both explicit and implicit systems. It's more about the ergonomics, and if one leads to more or less accidental usage of untracked state.

The pros of explicit tracking are:

Dependencies are a bit clearer initially. In theory, you can follow back the graph statically to the original Signals pretty easily, so the shape of the graph is a bit clearer (This does not really scale though, more on that in a minute)

That's basically the only benefit I can think of (feel free to add more, I could definitely be missing some). The cons, however, are pretty massive:

Explicit dependencies encourage large, dense graphs. Every transformation in explicit systems has to be piped through the system, similar to how async/await pollutes a function graph, but even worse because it applies to both sync and async operations. This means that if you take a general functional utility library, you cannot just use it on your graph. You must write an interop layer, and those two have to be kept in sync. This effectively means that the programmer must keep both layers in their head at all times, the functional code and the wrapper code.

In practice, this leads to users writing many utilities that do quick and repetitive tasks within the system that are easily composed, rather than making large, monolothic wrappers that are difficult to compose and reason about. Eventually, you get something like Rx.js style composability, which allows programmers to think mostly in "just Rx.js". But in doing so, they've added a ton of nodes to the system for even basic tasks, and the of overhead to the system as a whole has increased a large amount.
Explicit dependencies easily fall out of sync. Because users must add and remove dependencies manually, it's really easy to accidentally forget this step and keep an additional dependency when it's no longer needed, or forget to add one in the first place. Some designs help to prevent this, but it's always a possibility and leads to common ordering issues and "heisenbugs" when you update the dependency graph to match reality and that node stopped doing as many updates.
That kind of leads into the other issue, which is implicit dependencies still exist. As noted in the beginning, it's possible to introduce randomness to any dynamic graph. It's also possible to depend on state in a way that works, but since it's not tracked, could introduce inconsistency if something unrelated changes.

This tends to happen more and more as apps reach larger and larger scales, and more abstractions are built on top of the system. And the biggest issue here I find is that developers are lulled into a false sense of security - they think because they have this list of explicit deps, they don't really need to look for the implicit ones.

To be clear here, implicit dependencies are an antipattern and generally bad practice in any reactive graph. But the problem is even worse in explicit systems in my opinion, probably because of the last issue.
In explicit systems, as we noted, you need to bubble up your dependencies through every single layer of the system, producing a dense, complicated graph. Anyone can make a mistake at any level in this graph, i.e. forgetting to add an implicit dependency as explicit, or side-effecting in a weird way, or reading some global state - and doing so will affect the entire system.

I think the ecosystem has been gravitating toward autotracking dependencies because it solves these issues:

Autotracking discourages large, dense graphs. You can create a Computed, and it can access a State 10 levels deep in the function call stack and still track it. None of those functions in between need to be written in anything other than plain JavaScript, no other information needs to be propagated, and the user doesn't have to do anything. Plain utility libraries integrate easily and seamlessly, and users are encouraged to place minimal annotations for the graph, only when they're truly needed (e.g. this particular function call is really expensive, so memoize it). The system as a whole is much more performant and minimal because it has as few nodes as possible.
Dependencies are automatically tracked, so there is no possibility of them falling out of sync over time.
Implicit state is more obvious because it's not marked as a State signal, or accessed as one. Calling a State signal jumps out of the function, and it becomes really obvious when a value is not tracked generally speaking. We also find that it's much easier for users to think about the state itself than its dependencies. "Does this value change over time? Yes? Then it needs to be annotated. No? Then it does not." It's a much simpler decision tree for each piece of state.
Because graphs are much less dense, and manual dependency management is not needed at each level, it is much less common for issues to be introduced as the size of the graph grows. They still can occur, of course, but there are fewer opportunities for them to occur in general, and that makes them easier to see and easier to track down.

We were nervous about adding this system to Ember.js at first and debated it for quite a long time with essentially the same concerns that @benlesh has noted here. But after seeing it proven out by MobX and Vue, we went for it. We found that code quality, understandability, and performance increased across the board, and counterintuitively, dependencies were easier to understand and track through the system.

Last couple of notes:

There's not really a precedent for this sort of thing in the language.

I would argue that Garbage Collection is very similar to this - rather than explicitly tracking ownership of heap allocations like in Rust or C, we track them implicitly via usage and if something gets disconnected from the graph, it is cleaned up. Garbage collection does introduce some complexity - you can avoid thinking about it, until you have to think about it, and then all of a sudden you have to reason about a lot of things in the system. But the rules are relatively simple, and it's possible to write safe libraries and frameworks that don't leak memory, and to debug issues when they arise.

Some would argue that the overhead is not worth the cost, and we're just stuck with it because that's how JS was built in the first place, but I also know many senior developers who absolutely prefer working in a GC'd language and not worrying about those details most of the time. I also have found that autotracking dependencies for Signal systems is actually easier to understand and debug than GC, since you can't have cycles (you get "max callstack size exceeded" in most impls since they're "just functions", and I think we absolutely should have that here) and you have much more explicit annotations in general than with memory management.

In particular the useEffect deps array from React comes to mind, and honestly I've never been bitten by stale references so much in my life as when I switched to React hooks (prior to there being lint rules to help with this).

This is actually an example of explicit dependencies gone wrong! In the React system, you're responsible for listing out and managing those dependencies, and you must absorb their changes and staleness of object references, etc. In Signals systems that's not possible, because State changes decide if they should update their Computeds when they are updated. You can have the State do a deep equals or a shallow equals on the value, and you are always looking at the most recent value for each Signal externally. Changes don't propagate unless a State or Computed says they do, and if they say they do, they always propagate. No gotchas there.

OK, to answer the original question: I think we can conclude that the rough consensus in this thread is to keep going with auto tracking. There's some very good writing in this thread, and it'd be great if it could be moved to the README. Could someone make a PR for this, and then we can close this issue once the resolution is documented?

tc39 / proposal-signals

Document motivation for dynamic dependency tracking #9