Closed dead-claudia closed 6 years ago
BUMP - we really want this to happen. Every app beyond some website manipulation, which reuses objects by caching them in a map will tend to suffer from memory leaks, which their developers have to work around by doing sth. like manual reference counting.
@0815fox we have a test prototype on v8 now : https://github.com/bmeck/v8/tree/weakref . It won't be landing anytime soon unflagged since proposal is still stage one, but would love some feedback.
I've added this to our bugtracker.
Hi @bmeck , thanks for doing this!
What's the simplest way to obtain or build a node with this feature, whether behind a flag or not? I'd like to start using this immediately to experiment with a remote object system along the lines of CapTP / Capn'n Proto. Thanks!
@erights for now placing my branch into deps/v8
in node's source code should work. might have minor bugs but trying to merge it behind a flag into v8 proper, d8
is what I have been using for testing
Behind the scenes, we were working on prototype implementation. After a few partial attempts, @bmeck made a real implementation in v8 (for node). I've written some tests in it, and am now inviting good examples to implement on it.
Folks on that Chrome browser team expressed concerns about the potential impacts of exposing GC and finalization non-determinism. For example, what happens if major sites or libraries end up depending on specifics of finalization timing? That might prevent GC improvement, create new performance hiccups in different memory environments or across different browsers and versions, etc. The design attempts to significantly address those concerns. I plan to file issues to surface specific sources of non-determinism risk so we can discuss how they are mitigated and help determine whether they are mitigated enough to minimize their concerns.
@dtribble When you say finalization timing do you mean as in when garbage collection happens? e.g.:
async function foo() {
let garbage = new Foo()
// Perhaps an engine keeps around the original object for debugging info or whatever
const somePromise = garbage.fizz()
const ref = new Promise(resolve => makeWeakRef(garbage, resolve))
// Destroy the garbage object
await ref // This might not be reached depending on how the implementation deals with garbage.fizz()
await somePromise
}
Or the actual order things are garbage collected in? e.g.
let o = {
x: new Foo(),
y: new Bar(),
}
makeWeakRef(o.x, _ => console.log("x"))
makeWeakRef(o.y, _ => console.log("y"))
o = null
// Order of console.logs indeterminate
Both are worth discussion, though the issues like your first sample are typically more subtle. The "order of finalization" is partly addressed by "don't depend order" but can be helped by for example requiring implementations to randomize finalization order so the order dependencies surface quickly in testing.
So what's the current situation? Who needs to be poked to keep moving things forward?
@bmeck sorry first for my late reply. Our current product has the main focus for this on browser side, as the browser application caches the results it gets from server after preprocessesing them. So at the moment I do not have a use case for it on the node.js side. I just could offer do some trivial experiments with it, in case that would help.
Is the way to obtain/build as you described above still the desired way?
@0815fox yes that's still the right way.
@daurnimator I'm the one that needs to be poked :). Next steps are writing up the non-determinism issues and work out any additional mitigations. Concurrent with that are building examples.
@dtribble anything I can do to help things along?
Thank you all for keeping this thing moving forward.
We also urgently need this feature.
I currently plan to have an update at the March TC39 meeting.
The presentation is updated. The proposal document update is in progress.
A first round of spec-text is now pushed for the new APIs at https://github.com/tc39/proposal-weakrefs/blob/master/specs/spec.md.
Last week I updated the API and presentation based on feedback. A note on the API change in the last presentation: it looks bigger than it actually is.
1) Some of WeakRef was pushed in to a new parent, "WeakCell", so that it could better support long terms in wasm, and because there's now two types, 2) WeakRefGroup was renamed to WeakFactory.
WeakRef is unchanged (and creation still preserves the Target until the end of turn). The new "WeakCell" is for finalization only, so it doesn't have a deref() and creation does not strongly preserve the Target until the end of the turn.
WeakRefGroup was renamed to WeakFactory.
I'm not sure that was the best choice; Other weak-related features may be introduced later: we already have WeakMap; and who knows if we'll end up with weak-sets or other such things.
@daurnimator We already have weak-keyed WeakMap
and WeakSet
. The thing we lack is a map with weak values, but that's solvable with weak refs pretty easily.
However, I also feel the name isn't quite optimal. I feel something like WeakCellGroup
or WeakCellFactory
would work better, something that more fully encapsulates what it manages.
WeakValueMap?
@ljharb As a replacement for WeakFactory
?
Not sure; the names "cell" and "factory" don't give me an intuition about how they work and what they do. My understanding is that there's a possible primitive for "weakly pointing to a single JS value" and "a collection that weakly points to its JS values" - the latter to me seems like a "WeakMap" except that it's the values, not the keys, that are weak - thus WeakValueMap.
A "FooFactory" to me is a function that when called, returns a "Foo" - so a "WeakFactory" would need to produce "Weaks".
My understanding is that there's a possible primitive for "weakly pointing to a single JS value" and "a collection that weakly points to its JS values" - the latter to me seems like a "WeakMap" except that it's the values, not the keys, that are weak - thus WeakValueMap.
There's also an additional useful primitive: a map with both weak keys and values (which in practice can be cheaper, as the GC can skip iterating the map at all when sweeping!)
@daurnimator that's easy to create tho with a WeakMap + a WeakRef, no?
As long as we are bikeshedding...
WeakFactory used to be WeakRefGroup, and it was about the creation and collection of a bunch of related weak references. However that name doesn't work once we added WeakCell (and it's not a collection class). I am not a big fan, and I agree that it may be too broad a name given that it doesn't have anything to do with WeakMap or WeakSet. So suggestions welcome :)
@daurnimator that's easy to create tho with a WeakMap + a WeakRef, no?
Yes it is; but such a construct would no-doubt have less than ideal performance.
All such things can be built on top of WeakRef once we have it; it's the true new primitive from a semantics perspective. (well.... unless we need to access the object for finalisation)
Thanks @dtribble for continuing to push this forward! I think there are some API tweaks that will make this more JS-like.
Looking at semantics though, why do we have WeakCell? I see that it has to do with the WASM use case but I'm unsure how it fits in. Apologies if this has been covered elsewhere.
@zenparsing WeakCell
helps the Wasm case by never creating a strong reference. Since WeakRef
s keep a strong reference until the end of the turn after a) being constructed, and b) being dereferenced, they prevent the referenced object from being collected at all if the Wasm module never yields to the event loop. The WeakCell
constructor doesn't create a strong reference, so it fixes a), and doesn't allow dereferencing, so fixes b).
The slides give at least some context to this.
@tschneidereit Thanks.
I understand why WeakRef
creates a strong reference on dereference, but why does a WeakRef
create a strong reference on construction?
Or, to put it another way, why do the arguments that lead us toward strong refs on construction of WeakRef
not apply equally to WeakCell
?
Or, to put it another way, why do the arguments that lead us toward strong refs on construction of WeakRef not apply equally to WeakCell?
I guess the straight-forward answer is that this difference is a large part of the reason for introducing WeakCell
in the first place :) I'll let @dtribble give a more substantive answer on why WeakRef creates a strong ref for the current turn on construction because I'm not entirely sure why that is necessary, either.
WeakRef
creates a strong pointer on creation to avoid revealing GC behavior. Here's a simple example
Within a turn:
let foo = new Foo(...);
let wr = this.weakFactory.makeRef(foo, key);
foo = null;
// do some work
if (wr.deref() === undefined) {
// PROFIT!
}
@dtribble can you explain why, if "revealing GC behavior" happens with a WeakCell, it matters that WeakRef avoids revealing that?
Revealing GC behavior is much more difficult with a WeakCell
because you cannot do the test at the end.
The other reason we want weakRefs to be strong till the end of the turn is:
let wr = this.weakFactory.makeRef(new Foo(..), ...);
// do some stuff
return wr.deref();
In this pattern (which can emerge in reasonable code), the object is returned from deref
, but could have gotten collected before that was called. I can argue that the programmer should have held onto that reference strongly elsewhere, but that's not always possible, and even good programmers don't always do that right. the consistency property just eliminates this possible footgun.
Revealing GC behavior is much more difficult with a WeakCell because you cannot do the test at the end.
Can you observe the collection of the WeakCell-wrapped object (within the turn) using the cleanup function?
The automatic cleanup function gets scheduled in its own turn, so you cannot observe it that way during a turn.
You could potentially find out via cleanupSome
.
That's what reminded me of the primary motivation, avoiding the failure mode in returning wr.deref()
later.
You could potentially find out via
cleanupSome
.
If in-turn GC is observable by any means, then I think the motivation for creating a strong reference on construction is reduced; all that's left is the read consistency argument and that doesn't seem to apply since we don't have multiple "reads".
If we don't create a strong reference on construction, then it seems like WeakCell
is equivalent to a WeakRef
in which the user never calls deref
. If so, then perhaps we can simplify things and just have WeakRef
. Users that don't want to create "until end of turn" strong references could just opt not to call deref
.
If in-turn GC is observable by any means, then I think the motivation for creating a strong reference on construction is reduced;
Correct. The actual motivation is subtler. Even before we introduced WeakCell, we did not have true read consistency. Demonstration:
let foo = new Foo();
const wr = wrFactory.makeRef(foo);
//---- turn boundary ----
foo.bar();
foo = null;
//---- gc happens, reclaims foo ----
console.log(wr.deref()); // undefined
In all versions of our proposal, this violates read consistency because the use of foo.bar()
observes by normal means that foo is still around. However, we would be insane to propose additional bookkeeping to keep track of this observation. Thus, for our approximation of read consistency, we only consider observations-of-non-reclamation through the "normal" operations of the weak reference system.
Without this insane extra bookkeeping, we never had the ability to fully prevent the observation of in-turn gc, and so never had full read consistency. So what were we ever accomplishing?
But first, some history. The hazard that blocked weakrefs for a long time is a sensible objection:
People will write code that is incorrect by the specification, but happens to work on all implementations and under testing because of the way unspecified timing just happens to work out. This code gets shipped. For totally innocent reasons, such as an improvement in gc algorithm, the timing changes and that code starts breaking. The normal browser game theory pathology kicks in: users don't punish the broken web site. Rather, they switch to a browser on which that site still works. The browser backs out of its gc improvement. There follows the normal nightmare that is too horrible to recount :/
What changed is that wasm's need for weakrefs was compelling enough for the browser makers to be willing to pay the costs above in order to have weak refs. There's no way for them to get the weakrefs they need without paying these costs. Having decided to pay these costs, this weakref proposal is the best way forward because it mitigates the most severe form of this hazard.
Our slides have examples like
wr.deref() && wr.deref().foo()
not because we think this is good code, but because we consider it inevitable that people will write code like this. This code will almost always happen to act correctly, so the problem likely won't be caught by testing. There are plenty of non-deterministic interleaving hazards that JS programmers cope with turn-to-turn, because turns are the transactional-like unit, with unpredictable interleaving of other turns interleaving between turns. (The promise design was designed to mitigate similar due to this unpredictable interleaving of turns.) So non-deterministic interleaving of the observable effects of gc at turn boundaries is the most tolerable form of this hazard.
Within a turn, programmers apply conventional sequential imperative intuitions, appropriate for code within a transactional-like unit. Of code patterns we expect to both be frequent, because they're using "normal" operations, and to often pass all tests. For code that happens to work reliably under such conditions, we'd like as much of that code as practical to actually be correct. Modulo other costs of course; it is a tradeoff.
So the similar example that I consider just as pressing is:
const wr = wrFactory.makeRef(foo());
console.log(wr.deref());
The problem is that both of these are very "normal" operations. cleanupSome
is not. That's why makeRef
needs to be considered an observation but makeCell
does not.
The talk and paper Uncanny Valleys in Declarative Language Design https://www.youtube.com/watch?v=hQ4Y-eAOZ-8 https://research.google.com/pubs/pub45983.html is about this "uncanny zone" of language design, of programs that are both intuitive and always happen to work correctly. Although Yedalog is an extraordinarily different language, our core response is still the same. When possible, change the spec and implementations so that such programs actually become correct.
@erights
This explanation was very helpful. I certainly agree that we should protect programmers from surprises in the case of:
wr.deref() && wr.deref().foo()
I'm slightly less convinced by:
const wr = wrFactory.makeRef(new Foo());
wr.deref().foo();
but I'm willing to assume that it's common enough.
My primary concern is the surface area of this new API. I was hopeful that there would be one new globally-scoped constructor, WeakRef
, but with the new design we have WeakFactory
, WeakCell
, and WeakRef
, and an inheritance relationship between WeakCell
and WeakRef
. That's quite a lot of conceptual overhead for a programmer that wants to create something simple, like a weakly-held list of event listeners.
I'm curious if there are other API options that would give us the same benefits. Would it be possible for WeakRefs to have a method that "re-weakens" them after construction or deref
?
This proposal hasn't really gotten much action lately, so I'm just curious how it stands right now. Driven by this inquiry, but thought I'd file an issue over here instead.