tc39 / proposal-signals

A proposal to add signals to JavaScript.
MIT License
3.37k stars 57 forks source link

Core signal concerns (globals, auto/implicit tracking, async) #147

Open TarVK opened 6 months ago

TarVK commented 6 months ago

Concerns

I don't know if this is an appropriate place to start this discussion, but I've had some concerns about signals for a while now. I've worked on some statemanagement systems in the past myself, and independently came up with something similar to signals. Then when I found out that signals already exist and are semi-standardized, I got quite excited. However, there are 3 major caveats I see with signals that I don't like:

  1. They rely on global state
  2. Dependency tracking is implicit
  3. Computed values require synchronous code

Reliance on globals

In order for computed values to be aware of their dependencies, dependencies have to communicate their presence to computed values. The approach I have seen taken by several libraries to achieve this, is the following: Before evaluating a computed value, it tells some global entity G that it will evaluate. Then the computed value computes and accesses a signal S, during wich S informs G of the access. In turn, G can let the computed value know that it apparently depends on S.

This approach relies on globals, and without special runtime language features, I don't think one can do much better than this approach with the current form factor of signals. The reliance on globals hurts interoptability of different libraries however, similar to the problem that was faced by React hooks. I don't know how this problem was solved in React (because I believe it is now?) so it might be solveable. Additionally if this becomes a core part of JavaScript itself, it might not be an issue to begin with. But I wanted to express my doubts on this either way.

Implicit dependency tracking

Dependency tracking is implicit in signals. This means that when looking at the code for computed values, you can not directly see which of the dependencies can cause recomputations. Now you might argue that you don't have to see it, that it's not relevant, but I think it can be for several reasons:

  1. New users: For new users to learn the system and get introduced to signals, the current approach does barely even hint at computed signals recomputing automatically when data changes, making it harder to understand what's going on and properly work with the system without introducing bugs.
  2. Spotting mistakes: It is easy to read values of signals in an environment where automatic re-evaluation doesn't occur, without even being aware of this.
  3. Opting out of tracking: In certain scenarios you might want to access the value of a signal in a computed value, without actually registering it as a dependency. This is probably rare, but it is possible that you just want to have access to the value at that moment in time and nothing more.

I like the system that async/await introduces more. Here an explicit 'await' keyword exists, that makes awaiting data explicit. Similarly a 'watch' keyword could exist for computed values, that makes dependency tracking explicit. For async/await it was decided that having an explicit await keyword was beneficial, even though this could also have been implicit. I don't know what the reasons for this choice were, if it was for technical reasons or usage reasons, but either way I believe this to have been the better choice.

Requiring synchronous code

One thing that seems to be lacking from most signal libraries/systems, is asynchronous computations. There may be several reasons for this, potentially including hydration and server side rendering concerns, but besides this the current way of dependency tracking also limits potential async implementations. Ideally, I would want to be able to write something like this:

const somethingLoadable: Signal<number> = AsyncComputed(()=>{
  const val = someDependency.get();
  const newVal = val*2;
  return new Promise(res=>setTimeout(()=>res(newVal)), 1000); // Dummy delay promise
}, 0);

This would result in a signal that has a default value of 0, and when accessed would perform the computation but without updating its value yet since the computation does not finish synchronously, updating its value and signalling it only afterwards when the computation finishes.

I believe there can be a very nice interplay between asynchronous computations and signals in this way, symplifying how asynchronous computations are used in defining state. The current norm for dependency tracking however falls apart in this use case. If a value is accessed after some delay, the global entity orchestrating the whole process no longer knows what computed value to add the dependency to. This for instance would happen in the following code:

AsyncComputed(async ()=>{
  const someNumber = await getSomeNumber();
  return someNumber * someDependency.get();
});

Personal ideas

So let me first mention that I'm aware I'm not as deep into this material as the people working on this repository are, but because I've worked on my own system for a while and have thought about some of these tradeoffs, I did think it was worth reaching out.

All of the issues above were solved in my own system by a simple 'change' (this is actually just how I designed it before I learned about signals) of the interface of computed values. In my system I work with a 2 phase notification, which I believe is similar to what many signal libraries do, where I first mark dependents as dirty before dispatching a change event. As such my signals (which I call "watchables" in my system) have the following interface:

/**
 * A watchable value type
 *
 * Invariants for every watchable `w`:
 * 1.  Notifies: if for `const v1 = w.get()` and `const v2 = w.get()` with `v1` obtained before `v2` we have `v1 != v2`, then `w` must have dispatched `w.change` events before `v2` could have been obtained
 * 2.  NoRedundantEvents: Between any two `w.get()` calls, at most a single `w.dirty` (or symmetrically `w.change`) event is dispatched
 * 3.  DirtyBeforeChange: When `w.change` is dispatched between two `w.get()` calls, `w.change` is always preceded by `w.dirty`: `w.get() â‹… w.dirty â‹… w.change â‹… w.get()`
 */
export interface IWatchable<X> {
    /**
     * The current value
     * @returns The current value of the watchable
     */
    get(): X;
    /**
     * A function to register listeners for value dirtying, returns the unsubscribe function
     * @param listener The listener to be invoked
     * @returns A function that can be used to remove the listener
     */
    onDirty(listener: IRunnable): IRunnable;
    /**
     * A function to register listeners for value changes, returns the unsubscribe function
     * @param listener The listener to be invoked
     * @returns A function that can be used to remove the listener
     */
    onChange(listener: IRunnable): IRunnable;
}
export type IRunnable = () => void;

Then, computed values (which I call derived values) are defined using explicit dependency tracking. Consider for example the following fragment:

const val1: IWatchable<number>;
const val2: IWatchable<number>;
const derived = new Derived(watch=>{
  return watch(val1) * watch(val2);
});

Here the watch function both calls the get of the watchable, and registers the watchable as a dependency of the derived value. As a result, this system does not rely on globals, allows users to choose whether to actively watch dependencies, and can deal with asynchronous computations. In my own system I go a bit further and also attach metadata to watchables that allow for automatic tracking of additional data like whether the derived value depends on any data that's still loading. But this is probably outside the scope of a standard JavaScript primitive.

NullVoxPopuli commented 6 months ago

the global entity orchestrating the whole process no longer knows what computed value to add the dependency to.

imo, this is a feature, not a bug or design flaw -- you need some way to choose what signals to entangle with. As in: https://github.com/NullVoxPopuli/signal-utils/

import { Signal } from 'signal-polyfill';
import { signalFunction } from 'signal-utils/async-function';

const url = new Signal.State('...');
const signalResponse = signalFunction(async () => {
  const response = await fetch(url.get()); // entangles with `url`
  // after an away, you've detatched from the signal-auto-tracking
  return response.json(); 
});

// output: true
// after the fetch finishes
// output: false
<template>
  <output>{{signalResponse.isLoading}}</output>
</template>

This is implemented via an impure Computed:

const state = new State(fn);
const computed = new Signal.Computed(() => {
  state.retry();

  return state;
});

If we always tracked any signal accessed within an async function, we'd more easily have infinite loops -- as any Signal accessed within state would entangle with along with url changes -- which would make loading state (or any async state) much harder to handle.

return watch(val1) * watch(val2);

This is an interesting idea, and would indeed be very explicit about what's happening. However, it "feels like" (to me, I could be wrong! and please correct me if I am), that we'd lose autotracking, and the transparent abstractions we can achieve by hiding the underlying Signal API -- for example, app-devs consuming a Signals (as designed today), may not even know they have reactive data! and things would just work! :tada: (provided folks using destructuring in their component renderers do not do so too eagerly -- this is a DevEx thing being explored in frameworks that use functions as components / computeds / async computeds).

For example, if I'm a library author, and I implement a reactive array, such as: https://github.com/NullVoxPopuli/signal-utils/tree/main?tab=readme-ov-file#array, and then pass it to you, you could mutate the array with and my library could just reactive update without you having any knowledge of reactivity -- it behaves just like a native array :tada:

DavidANeil commented 6 months ago

Requiring synchronous code

I believe AsyncContext would make it possible to have Computeds with async callbacks "just work", but I think a proposed integration between these two proposals should wait until AsyncContext is (at least) Stage 3.

TarVK commented 6 months ago

imo, this is a feature, not a bug or design flaw -- you need some way to choose what signals to entangle with

I'm sorry, but I don't quite follow this statement. Would it not be nicer if after an await, it would still track the following dependencies? Consider for example the following code:

const signalResponse = signalFunction(async () => {
  const response = await fetch(url.get()); // entangles with `url`
  const json = await response.json();
  return json.someField * someSignal.get(); // Should entangle with `someSignal`?
});

Would the current signal implementations throw an error here when accessing someSignal? I don't think so right? Because signal values should also be accessible outside of contexts that register dependencies. So in that case, a developer is likely to think that their computed signalResponse will also update when someSignal changes, but it in fact does not. Again, I don't have enough experience with signals to know this for sure, so you should correct my train of thought if it's wrong, but to me this seems a bit error prone?

we'd lose autotracking, and the transparent abstractions we can achieve by hiding the underlying Signal API

That is indeed correct. I don't think my own system is flawless either, there's just different tradeoffs to consider. These considerations become especially important when making something a real part of the language, which is why I thought I should mention it. I personally have my doubts whether this should be part of the language, but there are already different issues regarding that topic so I won't make any further mention of that here.

provided folks using destructuring in their component renderers do not do so too eagerly

This is actually where my concern lies. A fully automatic transparent system is really nice, if it indeed is fully transparent. However when there are caveats to consider I prefer non-transparent systems and forcing developers to be aware of what they are doing. I am not aware of how many caveats there are with the implicit system, so it might not be that bad, but this is at least part of where my concern comes from.

hediet commented 6 months ago

@TarVK I share your concerns about global state and implicit subscription! And I find your personal ideas very interesting! They are actually quite similar to what we have in VS Code (tutorial here).

Your interface is much simpler, but I wonder how you would support transactions (i.e. updating two source observables at once).

TarVK commented 6 months ago

Good to hear I'm not the only one with these concerns @hediet 🙂

Seems like VS code observables (at least on a high-level) function the same as the one in my project, which is nice to see! I chose a slightly different interface, to be more similar to async/await syntax, but they seem to function similarly.

The transactions have also been a concern of mine. I see VS code used:

transaction((tx) => {
  observable1.set(6, tx);
  ...
  observable2.set(4, tx);
});

I also considered the interface you chose here, but eventually settled on something different. The main concern here (which you are probably aware of) is that it becomes difficult to abstract this over functions. E.g. if I write some function:

function calculateAndUpdate(tx: ISomeTransactionType) {
  const value = ...; // Compute the value according to some logic
  observable1.set(value , tx);
}

I now have to add this extra parameter in the signature in order to be able to use it in transactions. I have not found a great solution to this problem myself, but ended up creating a "mutation" type that is returned from updating fields:

function doSomething(): IMutator {
  const value = ...; // Compute the value according to some logic
  return myField.set(value);
}
function doSomething2(): IMutator {
  return synchronized(myField2.set(25), doSomething());
}

The idea is that mutations can be combined, and no reusable function should commit the mutations but should instead return them. Only event handlers would actually commit the final created mutations. Here is a gist of a planning document I wrote in the past, which contains this concept: watchables updating-multiple-fields-without-invalid-states . The main benefit of my approach is that no parameters/types have to be added to function signatures syntactically.

With my approach it however becomes harder to enforce functions to support transactions, which I think would be slightly easier in VS code's approach by restricting how transactions can be obtained while making them required parameters for setters. In addition to this, my approach has a large risk of users forgetting to call commit, resulting in bugs. I believe the issues in my approach could be mitigated with linters, but this is not ideal. I currently don't have a strong preference for either the mutator or parameter approach, either approach seems to be lacking in my opinion.

It would be nice to hear more opinions on this issue, so I might be able to improve my own system in the future!

jeff-hykin commented 6 months ago

I share your concerns, but primarily #.2 I am on the side of "be explicit" (if a value on the list changes then the function runs; end of story)

I do think the proposal should stay as small as possible, so instead of solving the whole async problem, I'm curious what you think about having a warning system to limit the damage of implicit dependencies (which I don't like very much but I think its more likely to get accepted).

Discussion: Runtime Warnings for Autotrack

I think the biggest footgun of this proposal is using an external value like Math.random() in a conditional that then breaks auto-tracking of dependencies. The silver lining is that its often possible to detect this kind of usage. For example:

new Signal.Computed(()=>{
    if (Math.random() > 0.5) {
        return person1Name.get()
    } else {
        return person2Name.get()
    }
})

Warnings can be done performantly by checking for a sequence-change in signals.

If person1Name was used on the first run. Then person2Name on the second run, thats a warning because the only way the first evaluated signal can change is if external value is being used as a conditional.

While deeper values are more difficult. Ex:

new Signal.Computed(()=>{
    if (randomViewEnabled.get()) {
        if (Math.random()>0.5) {
            return person1Name.get()
        } else {
            return person2Name.get()
        }
    }
})

Its still possible to check "if the value of the first signal value did not change, then the subsequent signal should always be checked next". E.g. if randomViewEnabled stays true, but person2Name is evaled the first time, and person1Name is evaled the next time, then that is also a warning.

If warnings like this were extremely robust, I think it would massively reduce the foot-gun damage.

But I'm curious how you feel @TarVK (if this would address your # 2 or not)

jeff-hykin commented 6 months ago

And in support of your concerns:

Its incredibly useful to say "this computed didnt run, therefore the signals on this explict list have not changed" WITHOUT looking at the contents of the function or the values of stuff like Math.random().

I can see adding some mechanism to Signal.Computed to opt into manual dependencies. But I can't see removing auto-tracking.

(^ From a different thread)

  1. While performance scenarios can be crafted favoring autotracked or explicit, I think explicit dependencies are faster in many cases. And yes, performance isnt everything, but we are talking about a native API. The API should at minimum have verbose/clunky option that allows for maximum performance of frequent scenarios.
  2. I think auto-track could be a WONDERFUL warning system when combined with explicit dependencies. E.g. if using a signal, but its not in the explicit tracking list, print a warning by default if the autotracked finds it. Require clunky API for disabling the warnings.
  3. Sometimes I intentionally don't want to track a value. Ex: I want to use the most up to date C value if value A or B change, but otherwise I don't want value C busting the cache for an expensive computation. Its confusing to have a computed listener that just saves the value in a non-signal var just so that another computed value will bypass the auto-dependency detection.
  4. I don't think there's a good way to do async without explicit dependencies.
TarVK commented 5 months ago

I'm not sure I fully understand the following statement, could you elaborate:

this computed didnt run, therefore the signals on this explict list have not changed

I do think that the warning system you propose, especially using auto tracking for warnings combined with explicit dependency tracking, could be very neat. In the explicit tracking system you may forget to watch a value and in the auto tracking system you may be unaware of whether a value will be reactive or not. So in either case, such a runtime warning system during development could be very valuable!

It does not fully address my concerns though. It's kind of hard to put my concerns into words I suppose, because it's more of a general scepticism of the "unknown". I already know that there are caveats with autotracking, e.g. regarding asynchronous computations. So my stance is more based on being unaware of what other major caveats one might run into (since I don't have enough experience using signals myself), and not knowing how these things will affect new developers. In this case, I prefer a system with a clunkier interface that is more predictable (if I told the system to watch a dependency it will be tracked and otherwise it will not be, end of story). So I do think your suggestion is very nice and addresses one of the known caveats of signals, but it does not address my broader concern related to auto dependency tracking.

jeff-hykin commented 5 months ago

I'm not sure I fully understand the following statement, could you elaborate:

this computed didnt run, therefore the signals on this explict list have not changed

Definitely. So, if we have explicit tracking like an array of what to watch:

new Signal.Computed([person1Name, person2Name], ()=>{
    console.log("ran")
    if (randomViewEnabled.get()) {
        if (Math.random()>0.5) {
            return person1Name.get()
        } else {
            return person2Name.get()
        }
    }
})

If there's no "ran" printed, then that means, for sure, that person1Name and person2Name have not changed. We don't even need to know what the contents of the function are.

It's kind of hard to put my concerns into words I suppose, because it's more of a general scepticism of the "unknown".

That helps. I should word my question a different way though; If there were only these three choices, which would you pick

  1. add signals api as-proposed
  2. add signals with warnings system
  3. no change (reject proposal)
NullVoxPopuli commented 5 months ago

Definitely. So, if we have explicit tracking like an array of what to watch:

Ember used to have computeds like this, and they were slower and more memory-intensive than auto-tracking. We found (and adapting to the proposal, for demonstration) that this would be a more efficient on memory:

new Signal.Computed(()=>{
   console.log("ran")
   if (randomViewEnabled.get()) {
       let name1 = person1Name.get();
       let name2 = person2Name.get();

       if (Math.random() > 0.5) {
           return name1;
       }

       return name2;
   }
})

and still give you correct behavior with less cpu/memory resource consumption.

Tho, we've found that using a getter / function is faster than managing a cache for something this small:

// no cache present, "just works"
get randomName() {
   if (randomViewEnabled.get()) {
       let name1 = person1Name.get();
       let name2 = person2Name.get();

       if (Math.random() > 0.5) {
           return name1;
       }

       return name2;
   }
}

as for smaller derived values, there is often the tradeoff of "how much does it take to maintain a cache vs just access the values" - where cache is always going to increase memory consumption, no matter how you do it.

Now, that said, we do still need to have some benchmarking done on this implementation in the proposal, as the results could be different!

Just my 2 cents from one corner of JS <3

I know we have some competing experiences with whether or not explicit deps is faster or not (due to being in different ecosystems / environments), so I think the sooner we can formally test it out, the better!

jeff-hykin commented 5 months ago

I do think its a bit unfair to heavily imply "slower", and then kinda backpedal on "smaller derived values" and benchmarking. At the theory level, static is faster for any non-conditional computed values (sorts, hashes, math, substrings, etc). I don't know if that makes up the majority of in-the-wild computed values or not, but I do know its a frequent usecase.

But, its the other case, the has-conditional case, that I find incredible irony with: My prof once said "There is an algorithm that can compute, for any problem(!), the answer in O(1). The downside is the answer is wrong ... but still O(1)!". I would say it looks pretty bad to talk about auto-dependency tracking being faster for the first or second example I gave when the behavior is wrong for both of them.

So speedy that new and senior devs alike shoot themself in the foot without even noticing, is not really an argument in-support-of auto dependency tracking. (Keep the gun, just inform people when its pointing at their foot.)

think the sooner we can formally test it out, the better!

I agree, we can let good benchmarks decide speed, but let us use the (lack of) footguns decide the API

lassevk commented 5 months ago

One reason why an explicit list of dependencies might be "slower" is that it wouldn't cater to conditional computeds.

Example:

new Signal.Computed([nameSelector, person1Name, person2Name], ()=>{
   console.log("ran")
   if (nameSelector.get())
      return person1Name.get();
   else
      return person2Name.get();
})

If nameSelector currently returns true, so the computed returns person1Name, this computed would be re-evaluated whenever person2Name changes, which in its current state is irrelevant.

Auto-tracking would only capture nameSelector and the current name being used, at all times, which are the only changes that should result in re-evaluation.

It wouldn't be hard to consider other examples where more code would be involved which could result in more cpu cycles burning, for no reason.

TarVK commented 5 months ago

Right I see @jeff-hykin then we were not fully on the same page. I personally am in favor of explciit dependency tracking in the form I mentioned at the end of my report, something like:

const derived = Signal.Computed(watch=>{
  return watch(val1) * watch(val2);
});

Or using an interface more similar to that of vs-code observables:

const derived = Signal.Computed(reader=>{
  return val1.get(reader) * val2.get(reader);
});

Most of the benefits you mentioned are lost here, with my approach one:

  1. Needs to check the function body for dependencies (though they are somewhat highlighted)
  2. Gets no performance gains, dependencies are tracked per run

On the other hand it does have the benefits:

  1. Allowing for conditional dependencies
  2. Not requiring users to "duplicate" dependencies in two places

And we keep the shared benefits that both of us mentioned of:

  1. Allowing one to opt-in to watching a depndency or not
  2. Knowing for sure whether a given dependency is tracked or not by looking at the function call rather than the implementation of the dependency
  3. Allowing for async computations to work smoothly

As for your question of the 3 options. As mentioned before I personally think this proposal should be rejected. I think that's outside the scope of this debate, but I will mention my reasons briefly:

  1. There seem to be many stances on how a primitive like this should work, not a clear consensus. We already saw 4 approaches in this short thread
  2. Since no syntax is attached to the proposal, a consensus on a standard (within specific contexts) can be reached without making it official part of the language
  3. Making it official part of the language will probably slow down experimentation with these types of primitives, since any other implementation will then have to compete with an official language feature for adoption

So taking that choice out of the question, would I prefer adding signals as proposed vs adding signals with a warning system? I definitely would prefer having one with a warning system. This is a feature I will to some extend now also try to add to my own system, thanks!

jeff-hykin commented 5 months ago

Thanks for all the clarifications @TarVK . I think you covered pretty much everything. I also would reject, but originally I would have for different reasons. I would definitely pefer your method over the proposal, but proabably still reject because I think javascript has enough footguns, and signal-based footguns the only ones senior college grads email me about.

Your reasons for rejection are strong points I hadn't considered. Especially because they're kind of beyond the relm of personal preferences. And on that note, I don't think any of our ideas pass those criticisms.

I'll leave this summary for others visiting (correct me if something looks wrong)

DavidANeil commented 5 months ago

How would an invoker "explicitly read" the signals accessed in something like a natively-called generator function?

    public *[Symbol.iterator]() {
        for (const signal of this.internalSignals) {
            yield signal.get();
        }
    }
TarVK commented 5 months ago

@DavidANeil I think that for any explicit system, one makes the distinction between raw return values and "trackable" values. In my implementation those trackable values are IWatchables, in VS-code those are IObservables, etc. This item contains the actual value that we are interested in, as well as an interface that allows to set up the tracking concerns.

Any function (in this case iterator) should return a trackable value rather than a raw value. Then from within the context of the invoker, both the tracking concerns can be set up and the value can be extracted (in the watchables case using watch). For the static list of dependencies approach, I'm not entirely sure how one would deal with a changing list of trackable items. When the list is constant (even if obtained from a generator) I see no problem.

hediet commented 5 months ago

@DavidANeil for implicit reads, I would actually have strong doubts if fields accesses in generator functions are (or rather can be) tracked correctly anyway. Any explicit syntax would give proof that it works/is supported.

jeff-hykin commented 5 months ago

For the static list of dependencies approach, I'm not entirely sure how one would deal with a changing list of trackable items. When the list is constant (even if obtained from a generator) I see no problem.

This is a good point, and effectively my next approach wouldn't be a true static list but more of a manually managed list. It could a more refined version of something like:

let comp1 = Signal.Dynamic([sig1, sig2], (triggers)=>{
    // triggers is just [sig, sig2]
    triggers.push(sig3)
    // now if sig3 changes the function will re-run
})

// alternative 
comp1.triggers.push(sig3)

// another alternative
let triggers = [sig1, sig2]
let comp2 = Signal.Dynamic(triggers, ()=>{
    triggers.push(sig3)
})
triggers.push(sig4)
parched commented 2 hours ago

@TarVK Is your library opensource? It sounds nice, I'd be keen to have a play with it.