tc39 / proposal-signals

A proposal to add signals to JavaScript.
MIT License
2.95k stars 54 forks source link

Watcher to accept WeakRef to allow watched live Signals to be Garbage Collected #156

Open btakita opened 1 month ago

btakita commented 1 month ago

If possible: A computed Signal should be garbage-collectable if nothing live is referencing it for possible future reads, even if it's linked into a broader graph which stays alive (e.g., by reading a state which remains live).

I created rmemo (reactive memo). A signal library. One of it's unique features is that it supports automatic Garbage Collection by using WeakRef.

My frustration with WeakRef is that it is relatively slow to instantiate. I would be preferable to have a single global WeakValueMap to manage the autosubscriber cache. Instead of instantiating a WeakRef for every live signal that has a dependency. An alternative way to manage the state in C/C++ would achieve the same result.

I believe a WeakValueMap would support primitive keys. WeakMap only allows object keys & object or primitive values. Since the key is allocated memory which would be subject to Garbage Collection. But a WeakValueMap is the other way around. Allowing object or primitive keys & object values. This would allow a numeric id for each Signal to be tracked by the WeakValueMap.

There is an implementation library called weak-value-map. It instantiates a WeakRef for each value, removing the performance benefit. See this discussion. There may be a valid use case for a WeakValueMap here.

littledan commented 1 month ago

Note that the current signal polyfill does support this weak collection already, through the technique of removing the source->sink backedges for unwatched sink nodes. It ends up not requiring any special GC support at all.

btakita commented 1 month ago

Maybe I misunderstood the intent of the quote & the definition of "live":

If possible: A computed Signal should be garbage-collectable if nothing live is referencing it for possible future reads, even if it's linked into a broader graph which stays alive (e.g., by reading a state which remains live).

Particulary:

even if it's linked into a broader graph which stays alive

I implemented rmemo to have a Signal be eligible for GC if unreferenced elsewhere. Even if there is a link. i.e. The library doesn't create a strong reference with a link in the graph.

through the technique of removing the source->sink backedges for unwatched sink nodes.

I read this as: if a sink is connected in a broader graph & not referenced elsewhere, that sink is not eligable for GC. Am I correct?


edit:

Addressing the next list in the README:

  • Note that most frameworks today require explicit disposal of computed Signals if they have any reference to or from another Signal graph which remains alive.
  • This ends up not being so bad when their lifetime is tied to the lifetime of a UI component, and effects need to be disposed of anyway.
  • If it is too expensive to execute with these semantics, then we should add explicit disposal (or "unlinking") of computed Signals to the API below, which currently lacks it.

The WeakValueMap solution would be a backstop for the unlinking in long lived graphs. This would be relevant for server side reactivity. Where a Computed instantiated in a HTTP request & referencing a process-lived Signal. Would be eligible for GC even when not explicitly unlinked at the end of the request.

backbone87 commented 1 month ago

upstream reference (sink -> source) can be hard since the compute function also references them hard. once the entire computed goes out of scope so does the upstream references.

downstream references (source -> sink) do not exist unless watched.

(that is also how jotai handles this)

what you are probably refering to is: can we "auto unmount" when a watcher goes out of scope?

alxhub commented 1 month ago

Watchers are by definition owned things - something has to destroy them, otherwise they'll keep notifying. It's similar to registering an event listener on the window.

It'd be unusual, but not unheard of, for a watcher to be dropped/leaked without destroying it. In that case, the watcher would be permanent, which may be intended behavior.

btakita commented 1 month ago

I have a better grasp on mapping the nomenclatures. In rmemo, a memo (equivalent to a Computed) can also be an effect. So Computed & effect are combined. The memo needs to remain in scope, otherwise it will be Garbage Collected.

So in terms of this Standard Proposal nomenclature. Once a live effect leaves scope, it & it's dependencies (not strongly held elsewhere) would instead be subject to GC. To keep a live effect around, it could be referenced by another object with the appropriate scope. As long as that "object with the appropriate scope" is strongly referenced, the attached effects are strongly referenced.

rmemo seems to be unique in this strategy. VanJS has a similar concept predicated on whether the DOM Element is attached to document. One the Element is unattached, it's Signals are subject to removal on the next update & then GC.

It's similar to registering an event listener on the window.

I believe that having the effects be subject to GC is more primitive behavior. With most implementations having a strong reference to live effects. Once an effect is unwatched, it can be deactivated & it's strong reference removed.

Perhaps the implementation library could have an option to decide if it's live effects are subject to GC or not. Defaulting to not being subject to GC (the current behavior).


Edit

It'd be unusual, but not unheard of, for a watcher to be dropped/leaked without destroying it. In that case, the watcher would be permanent, which may be intended behavior.

I'm not clear on what how a "common use case" is defined here. But it seems to me that this is related to component behavior. I'm concerned about state management independent of components. i.e. reactive domain models. Particularly on the server side or a reactive build.

My understanding is that there is a wide variety of implementations for reactive domain state. Which is why I want to advocate for the more flexible GC approach. Along with my interest in the rmemo library being compatible with this standard proposal.

backbone87 commented 1 month ago

you can register the watcher yourself in a finalization registry to auto-unmount. the reference impl in this repo currently wouldn't allow for this since one needs a strong reference to the watcher in order to dispose.

ugly draft:

const finalize = new FinalizationRegistry((watcher) => watcher.unwatch(...));
function createAutoDisposedWatcher(...args) {
  const watcher = new Signal.Watcher(...args);
  const proxy = new Proxy(watcher, Reflect);
  finalize.register(proxy, watcher)

  return proxy
}
alxhub commented 1 month ago

Having watchers that automatically shut down when they go out of scope feels very unpredictable and error-prone to me. Consider all of the warnings on FinalizationRegistry about the lack of guarantees and potential differences across engines, etc. In particular:

Developers shouldn't rely on cleanup callbacks for essential program logic.

And:

A conforming JavaScript implementation, even one that does garbage collection, is not required to call cleanup callbacks. When and whether it does so is entirely down to the implementation of the JavaScript engine. When a registered object is reclaimed, any cleanup callbacks for it may be called then, or some time later, or not at all.

btakita commented 1 month ago

you can register the watcher yourself in a finalization registry to auto-unmount. the reference impl in this repo currently wouldn't allow for this since one needs a strong reference to the watcher in order to dispose.

rmemo is a tiny fish here in a large lake. In my dream world, the watch function could take a WeakRef or better yet an option to store on a WeakValueMap (which would be require an addition to the standards).

Having watchers that automatically shut down when they go out of scope feels very unpredictable and error-prone to me.

I've been using this technique since October 2023. I have mostly experienced simplicity. I had one GC issue when implementing a helper function to chain cancelable promises that wait for a Signal's condition to be met. But otherwise, it's been a relatively simple matter to attach the Signal to an object or variable with the correct scope. Mostly it's a local variable, a ctx, a DOM Element, or a Promise.

A conforming JavaScript implementation, even one that does garbage collection, is not required to call cleanup callbacks. When and whether it does so is entirely down to the implementation of the JavaScript engine. When a registered object is reclaimed, any cleanup callbacks for it may be called then, or some time later, or not at all.

If a Signal is lazy, what is the difference? The watcher loop would longer than optimal if GC never happens? In practice, I've used the major browser engines, nodejs, & bun. GC seems to happen within a reasonable amount of time. Granted, someone could turn off GC. In that case, there is explicit unwatch.


edit

I'm not saying that this should be a widely used "best practice". But i don't think it should be verbotten. Javascript uses a GC for a reason. So we don't have to add extra logic to deallocate memory. I for one have found it beneficial to not have to worry about deallocating subscribers. Especially when you start getting > 10 nodes in a reactive domain model. Forcing the application programmer to be concerned about deallocation adds unnecessary complexity to apps in many cases.

btakita commented 1 month ago

@littledan @alxhub

JS is a Garbage Collected language. With the UI library, an unload event can be listened to unwatch the signal. Outside of a UI library, there is no such event. WeakRef is the "least bad" option that I can think of. Since I have been using this technique, I have not had any wierd issues. Tbh the most frustrating thing has been the FUD by people who havnt tried it. That & having an unpopular opinion on an idiom that demonstrably works well. If people have a problem with GC & WeakRef...If they are footguns. I'd like to see a demonstration as to why. Along with an explanation as to why GC + WeakRef are in the standards if they are a footgun. Cause I have not really seen a problem yet. GC & WeakRef makes sense to me.

I started with reactive systems over 15 years ago. Wrote my first JS signals library in 2013. Used several popular JS signal libraries since. rmemo has been the best dev experience I have had so far...bar none. I designed rmemo to have a similar idioms & lifecyle as regular functions...to be an "extension" of a regular function. So it's not intrusive to application data architecture.

Perhaps this is an antipattern & there is a better alternative? I'm all ears. I'm always game to improve & would love to see an even better way.


Edit. It turns out the V8's WeakRef has performance issues. I filed a bug. Bun, running on JSCore had a fast implementation of WeakRef. With that being said I no longer think adding WeakValueMap is a good idea. Since the initial desire was based on reacting to a V8 performance bug.

shaylew commented 1 month ago

This seems to be two similar but separate issues:

  1. Should a watcher strongly retain its sources?
  2. Should a watcher that isn't referenced, except through the graph, be disposed of and stop watching/firing its callback?

(2) is the one that exposes finalization timing behavior. If the watched signals are still alive and still changing, when exactly the watcher stops running would be up to the gc. This is unpredictable, and probably undesirable; all corners of the JS spec that have this nondeterminism baked in carry a serious warning label and are (rightfully) under heavy scrutiny.

(1) has no such complication, because a source whose only remaining reference is from a watcher can't change. So I think we should probably arrange for watchers to not retain their unchanged dependences. Changed dependencies may be necessary to keep track of, in order to re-watch them; so I imagine the implementation here would look like maintaining only the notifiedBy list on the watcher, rather than the full set of sources.

alxhub commented 1 month ago

Should a watcher strongly retain its sources?

At least with "correct" usage, I think watchers will never have the only remaining reference to a source, because in order to be a source in the first place, it'd have to be reachable within the computation being watched (part of its closure).

The only way for this not to be true is if the computation function changes in a way that:

Which is user error (although definitely a sharp edge).

shaylew commented 1 month ago

@alxhub That's all true of Compteds, but Watchers in this proposal don't have a tracked body like that -- they can add/remove edges with watch and unwatch however they like.

Definitely more of an edge case in some usage patterns than others, but I think it might come up in practice decently often. Some frameworks are likely to use a pattern where the effect body is a Computed and downstream of it is a Watcher used for scheduling, and in that setup there may be no other reference to the effect (or its closure body) except through the watcher.

alxhub commented 1 month ago

@shaylew ahhh, yeah, I was talking about a dependency that was upstream of a watched Computed. I see what you mean - if you watch a Signal.State and every other reference gets dropped, it can no longer change.