tc39 / proposal-signals

A proposal to add signals to JavaScript.
MIT License
2.95k stars 54 forks source link

Reducers #135

Open samholmes opened 1 month ago

samholmes commented 1 month ago

Into

Firstly, I'd like to say thank you to all involved in putting together this proposal to move a signals spec and standard forward. It's really exciting!

I've been working with signals in concept for over 4 years starting with a proof-of-concept library IronJS (https://ironjs.org). Now, we have many implementations and takes on signals each with unique capabilities and trade-offs.

I've recently been exploring minimalism for a signals API with the project called Flash (https://github.com/flash-js/core). It's take on signals is to consider them as just functions which can react to other functions.

It has "static signals", which hold cached value in memory, and it has computed signals. These are common semantics which other implementations have, and they appear to be included in the proposal.

Proposal

The one new, yet very powerful semantic, is the concept of what is being dubbed a "reducer signal". It is a computed signal, which references the previous value from the compute function. By referencing it's previous value, it requires memory and therefore makes it like a static signal in this respect. The ability for a signal to reference itself can enable a host of applications because it gives a computed signal a standard way to manage it's own life-cycle and state. With reducer signals, the ability to reduce any signal over time is trivial when it would other wise require multiple signals/hacks to accomplish. This additional API of self-reference is minimal for the gains it imposes.

I'll provide some examples of its use and applications for consideration in this proposal. Unless I have missed something in the spec, this is not yet included in the proposal.

Reducer API

const reducerSignal = on((prev = own(default)) => {
  return prev + dependency();
})

This signal reference it's previous value using own(). This function returns undefined or the default argument on the first invocation of the signal compute function. The following invocations of the compute function will result in the previous value of the prior compute function invocations. This self-reference is the defining characteristic of a reducer function and hence a reducer signal. A signal can reduce to a value over-time. This introduces a life-cycle (time factor) to the signal beyond just the momentary state declaration of a signal. The purpose of this design is to isolate such a time-based signal in an API that remains declarative and easy to reason about; otherwise, the complexity of such a thing can easily lead to more error-prone variants to achieve the same goal.

Life-Cycle Management

In this example, we illustrate how a self-reference can be leveraged for explicit life-cycle management/hooks.

const signal = on((cleanup = own(() => {})) => {
  cleanup()
  const id = setInterval(() => { ... } )
  return () => clearInterval(id)
})

This example illustrates how cleanup functions similar to the life-cycle event in React's useEffect could trivially be implemented. This implementation could be abstracted into a utility for convenience. It is not limited to a single life-cycle event because it can be extended with further complexity when the application requires it.

Control Flow

const batch = on((acc = own([])) => {
  if (acc.length === size - 1) {
    return [...acc, op()] // new batch of size
  }
  if (acc.length >= size) {
    // Start new batch using old batch reference
    acc.splice(0, acc.length, op()) 
  }
  // Return current batch
  return acc
})

const render = on(() => {
  paintEffect(batch())
})

In this example, we have a reducer of operations op() that are accumulated in acc for batch signal. Only when the batch reaches a length of size - 1 do we create a new array reference of length size operations and return it as the new value for batch. This triggers target signals downstream to re-compute over batch. Otherwise, we build on the current accumulator within the compute function for batch, resetting the size for a new batch until we reach a the size - 1 condition again to broadcast the new batch.

This is a useful pattern and control flow mechanism for signal graphs. I've yet to see API's the provide a way to trivially declare such control flow. This is one example of control flow which can be achieved with reducers. There may be other less contrived examples of this, but it is simply here to showcase the application that reducer signals have over the control flow aspect of a program's design.

Conclusion

I hope to have made a case for considering the capabilities of self-referencing signals and the use-case that derive from having them. I'd very much like to weigh out the pros/cons to such a proposal and any alternatives which achieve the same goals with less API overhead if there exists any such alternative. Otherwise, I strongly believe "reducer signals" to be a novel semantic too valuable to pass on within this standard's proposal.

Addendum: Note on Flash

The Flash library referenced is a work in progress and not a complete functional implementation of the proposals therein, yet. The goal of that project is to distill down the concept of a signal as a reactive function and the minimal API necessary to achieve that primitive.

EisenbergEffect commented 1 month ago

This is a really interesting idea. Would it technically only require the Signal.Computed() callback to receive the previously computed value as an argument?

fabiospampinato commented 1 month ago

By referencing it's previous value, it requires memory and therefore makes it like a static signal in this respect

Technically it doesn't really require extra memory because the computed has to remember its previous value anyway.

IMO something like this can be easily implemented in userland, except if the system supports transitions, where the computed itself would need to support this I guess, as you can't just store the previous value in a random variable in the closure anymore if the graph can be forked and the same computation can kinda simultaneously have different values.

EisenbergEffect commented 1 month ago

cc/ @shaylew Something to keep in mind as you explore transitions/forks.

trueadm commented 1 month ago

In the case of forks/transitions, the computed would need to re-fire providing the new value. I guess the one thing here to consider is that the computed must be a pure function if relying on this heuristic, otherwise this will cause a world of problems (that's why React has had to double invoke effects).

samholmes commented 1 month ago

This is a really interesting idea. Would it technically only require the Signal.Computed() callback to receive the previously computed value as an argument?

That's all it would require in the design proposed in this proposal. With Flash, a separate function was opted for because computed signals in Flash hold no cached value. Only when an explicit use of own is made, will the compute function hold a value. If Signal.Compute hold a value/memory footprint, then parameterization is feasible for the design of the spec's API. However, I would opt for consideration of cache-less/memory-less compute signals in the proposal.

IMO something like this can be easily implemented in userland, except if the system supports transitions, where the computed itself would need to support this I guess, as you can't just store the previous value in a random variable in the closure anymore if the graph can be forked and the same computation can kinda simultaneously have different values.

In Flash, the compute does not need to hold onto it's return value; it is ephemeral by default and strictly a compute function. Accessing the value of a computed signal is to invoke it's compute function to derive the value, unless the compute function is a reducer.

I'm not familiar with transitions. Could you reference the discussion on transitions please?

In the case of forks/transitions, the computed would need to re-fire providing the new value. I guess the one thing here to consider is that the computed must be a pure function if relying on this heuristic, otherwise this will cause a world of problems (that's why React has had to double invoke effects).

I'd presume purity on compute functions is an inescapable matter. What sort of considerations has the proposal made to mitigate any potential foot-guns that an impure compute function may introduce that would otherwise be a step backwards if self-reference were to be included in the spec?

fabiospampinato commented 1 month ago

In Flash, the compute does not need to hold onto it's return value; it is ephemeral by default and strictly a compute function. Accessing the value of a computed signal is to invoke it's compute function to derive the value, unless the compute function is a reducer.

Here they work differently, if the value doesn't change between executions they don't tell their dependents to refresh themselves, and in order to know if the value actually changed they need to hold a reference to their old value. Like that's kind the whole point of a signal, a computed is just a signal that gets updated according to a function you give it.

I'm not familiar with transitions. Could you reference the discussion on transitions please?

I don't have any pointers about that unfortunately, the only stuff I know about transitions I've learned in Solid's discord, I haven't implemented this myself (it doesn't seem worth the complexity and chance of bugs, as the code needs to be written in a way that's safe under transitions, which seems a major pain in the ass).

shaylew commented 1 month ago

This is closely related to the "classic" functional reactive programming stuff, which is basically "circuits with a delay operator". It's a pretty clean/principled way to introduce state/history into a reactive system, although it does move you away from from-scratch consistency in some respects.

...Unfortunately it kind of only makes sense for eager Computeds, not lazy ones. If you try to mix this with laziness, you get something that depends not only on the history of its inputs, but the history of when it was observed or not observed. Maybe eager Computeds would be useful for other reasons, in which case giving that type of Computed access to its previous value seems reasonable enough.

@EisenbergEffect Yeah, that's something I should keep in mind there -- a Computed that has observable effects isn't necessarily safe to clone/rerun in a speculative transaction, and having mutable state captured in the closure of a Computed is a common pattern for building some kinds of behavior like history-sensitivity.

fabiospampinato commented 1 month ago

...Unfortunately it kind of only makes sense for eager Computeds, not lazy ones. If you try to mix this with laziness, you get something that depends not only on the history of its inputs, but the history of when it was observed or not observed

I'm not sure I'm following the reasoning 🤔 Isn't what the last value of the computed is orthogonal to whether the computed is an eager one or not? I'm pretty sure Solid isn't going to ditch support for this once they switch to lazy computeds for example.

shaylew commented 1 month ago

It's implementable for lazy computeds but it's no longer very well-behaved. A lazy computed only runs when someone reads it, so its previous value isn't determined solely by the history of its dependencies; you also have to know the history of when it was read to know which states it saw and which it skipped over.

fabiospampinato commented 1 month ago

I see 🤔 I guess at the end of the day something like that in some sense can ~always happen, if you have only eager computeds you are probably going to need a way to pause updates to not refresh things a million times unnecessarily when updating lots of signals, which creates a similar situation.

samholmes commented 1 month ago

What is the benefit of laziness over eager?

shaylew commented 1 month ago

The annoying thing here is it's still possible to write well-behaved lazy computeds that make some use of history, as long as the history doesn't affect their value (as defined by their own equals function).

For instance you might be building up an object piece by piece, and reusing pieces that would be the same as last time can let you avoid allocation, keep downstream computeds from having to recompute as often, or make the equality check cheaper.

This is very much not the original use case for reducers, though; "you can have the previous value, as long as you don't do anything with it that affects your output" makes for a tricky API and there's not much the framework can do to tell whether you're straying from the subset with clean semantics.

samholmes commented 1 month ago

Here they work differently, if the value doesn't change between executions they don't tell their dependents to refresh themselves, and in order to know if the value actually changed they need to hold a reference to their old value. Like that's kind the whole point of a signal, a computed is just a signal that gets updated according to a function you give it.

A computed signal that doesn't cache its value would always impact its targets (downstream) unless you explicitly opt for this behavior and make it stateful.

const stateless = on(() => source())
const statefull = on((_ = own()) => source())

The reason I chose this in the design for Flash is because you can have both patterns if you opt for stateless by default.

I'm curious as to the primary motivation behind "laziness" for signal implementations. It seems like a needless feature since any signal not worth running shouldn't be declared anyway. Laziness reverses the control-flow; why have it backwards like this?

Addendum

Seems like a pull-model gives you some built-in optimizations because the effects determine whether the signals execute. As a developer, your control over performance comes down to how you manage your effects. You'd have to consider how you mount/unmount effects such that you can do batching of external state changes at the ends of your graph.

With a push model and a reducer, you can have full control over your state changes and therefore optimize were needed. With reducers, you have the ability to control the flow of state changes ate any point in the graph, however it may matter most at the end of your graph where the effects are anyway.

It seems like both approaches have pros/cons that don't yet distinguish them from each other yet. It ultimately comes down to preference at this point; unless I'm missing something about laziness that some could care to share.

DavidANeil commented 1 month ago

I'm curious as to the primary motivation behind "laziness" for signal implementations.

I am curious why you need a signal if you don't want the laziness and caching? Those are the two main benefits of this data-type.

If a Computed does not cache its value, then it must be always dirty. If the abstraction isn't caching the value, then it can't know that it would, might, or would not change when re-calculated. For example: what would the behavior be of new Signal.Computed(() => Math.random()) without a caching mechanism?

And as for laziness: I tried disabling this in our application just to see, and it increased the number of Computeds we calculate from 20,000 to 300,000. Not to mention that that probably introduced "glitches" into those computations. And we provably never used the computed value for anything "real".

samholmes commented 1 month ago

I am curious why you need a signal if you don't want the laziness and caching? Those are the two main benefits of this data-type.

Control-flow and state management in a declarative style.

If a Computed does not cache its value, then it must be always dirty. If the abstraction isn't caching the value, then it can't know that it would, might, or would not change when re-calculated. For example: what would the behavior be of new Signal.Computed(() => Math.random()) without a caching mechanism?

It would be as you expect; always dirty. If a Computed caches its value, Math.random() would always cascade as a change down the graph aside from the slim chance that Math.random() === cachedValue. If it didn't have a cached value the behavior is idempotent.

And as for laziness: I tried disabling this in our application just to see, and it increased the number of Computeds we calculate from 20,000 to 300,000. Not to mention that that probably introduced "glitches" into those computations. And we provably never used the computed value for anything "real".

What sort of "glitches" would it introduce? I'm curious as to what you mean by glitches here.

Of course, every signal change propagating does the graph would case more computed calls. But, if computed signals are lazy, and an effect is "mounted", how would there be any less computed calls? I suppose in the event that an effect pulls at the end of the graph, it would be querying the graph backwards and grab the latest cached values that aren't dirty and recomputing dirty signals but stoping the traversal if the new computed value matches the cache?

This means that in a lazy-based system, not all signal changes necessarily have an effect on the system unless it directly changes the target signals downstream all the way to the endpoints (effects). How is this different from a push system which will also propagate changes downstream and stop at identical values to nodes which have a cached value?