Status update? - Githubissues

dead-claudia commented 7 years ago

This proposal hasn't really gotten much action lately, so I'm just curious how it stands right now. Driven by this inquiry, but thought I'd file an issue over here instead.

0815fox commented 7 years ago

BUMP - we really want this to happen. Every app beyond some website manipulation, which reuses objects by caching them in a map will tend to suffer from memory leaks, which their developers have to work around by doing sth. like manual reference counting.

bmeck commented 7 years ago

@0815fox we have a test prototype on v8 now : https://github.com/bmeck/v8/tree/weakref . It won't be landing anytime soon unflagged since proposal is still stage one, but would love some feedback.

PaulBone commented 7 years ago

I've added this to our bugtracker.

https://bugzilla.mozilla.org/show_bug.cgi?id=1367476

erights commented 7 years ago

Hi @bmeck , thanks for doing this!

What's the simplest way to obtain or build a node with this feature, whether behind a flag or not? I'd like to start using this immediately to experiment with a remote object system along the lines of CapTP / Capn'n Proto. Thanks!

bmeck commented 7 years ago

@erights for now placing my branch into deps/v8 in node's source code should work. might have minor bugs but trying to merge it behind a flag into v8 proper, d8 is what I have been using for testing

dtribble commented 7 years ago

Behind the scenes, we were working on prototype implementation. After a few partial attempts, @bmeck made a real implementation in v8 (for node). I've written some tests in it, and am now inviting good examples to implement on it.

Folks on that Chrome browser team expressed concerns about the potential impacts of exposing GC and finalization non-determinism. For example, what happens if major sites or libraries end up depending on specifics of finalization timing? That might prevent GC improvement, create new performance hiccups in different memory environments or across different browsers and versions, etc. The design attempts to significantly address those concerns. I plan to file issues to surface specific sources of non-determinism risk so we can discuss how they are mitigated and help determine whether they are mitigated enough to minimize their concerns.

Jamesernator commented 7 years ago

@dtribble When you say finalization timing do you mean as in when garbage collection happens? e.g.:

async function foo() {
    let garbage = new Foo()
    // Perhaps an engine keeps around the original object for debugging info or whatever
    const somePromise = garbage.fizz()
    const ref = new Promise(resolve => makeWeakRef(garbage, resolve))

    // Destroy the garbage object
    await ref // This might not be reached depending on how the implementation deals with garbage.fizz()
    await somePromise
}

Or the actual order things are garbage collected in? e.g.

let o = {
    x: new Foo(),
    y: new Bar(),
}

makeWeakRef(o.x, _ => console.log("x"))
makeWeakRef(o.y, _ => console.log("y"))

o = null

// Order of console.logs indeterminate

dtribble commented 7 years ago

Both are worth discussion, though the issues like your first sample are typically more subtle. The "order of finalization" is partly addressed by "don't depend order" but can be helped by for example requiring implementations to randomize finalization order so the order dependencies surface quickly in testing.

daurnimator commented 7 years ago

So what's the current situation? Who needs to be poked to keep moving things forward?

0815fox commented 7 years ago

@bmeck sorry first for my late reply. Our current product has the main focus for this on browser side, as the browser application caches the results it gets from server after preprocessesing them. So at the moment I do not have a use case for it on the node.js side. I just could offer do some trivial experiments with it, in case that would help.

Is the way to obtain/build as you described above still the desired way?

dtribble commented 7 years ago

@0815fox yes that's still the right way.

@daurnimator I'm the one that needs to be poked :). Next steps are writing up the non-determinism issues and work out any additional mitigations. Concurrent with that are building examples.

daurnimator commented 7 years ago

@dtribble anything I can do to help things along?

nguyenbs commented 7 years ago

Thank you all for keeping this thing moving forward.

We also urgently need this feature.

dtribble commented 6 years ago

I currently plan to have an update at the March TC39 meeting.

dtribble commented 6 years ago

The presentation is updated. The proposal document update is in progress.

dtribble commented 6 years ago

A first round of spec-text is now pushed for the new APIs at https://github.com/tc39/proposal-weakrefs/blob/master/specs/spec.md.

Last week I updated the API and presentation based on feedback. A note on the API change in the last presentation: it looks bigger than it actually is.

1) Some of WeakRef was pushed in to a new parent, "WeakCell", so that it could better support long terms in wasm, and because there's now two types, 2) WeakRefGroup was renamed to WeakFactory.

WeakRef is unchanged (and creation still preserves the Target until the end of turn). The new "WeakCell" is for finalization only, so it doesn't have a deref() and creation does not strongly preserve the Target until the end of the turn.

daurnimator commented 6 years ago

WeakRefGroup was renamed to WeakFactory.

I'm not sure that was the best choice; Other weak-related features may be introduced later: we already have WeakMap; and who knows if we'll end up with weak-sets or other such things.

dead-claudia commented 6 years ago

@daurnimator We already have weak-keyed WeakMap and WeakSet. The thing we lack is a map with weak values, but that's solvable with weak refs pretty easily.

However, I also feel the name isn't quite optimal. I feel something like WeakCellGroup or WeakCellFactory would work better, something that more fully encapsulates what it manages.

ljharb commented 6 years ago

WeakValueMap?

dead-claudia commented 6 years ago

@ljharb As a replacement for WeakFactory?

ljharb commented 6 years ago

Not sure; the names "cell" and "factory" don't give me an intuition about how they work and what they do. My understanding is that there's a possible primitive for "weakly pointing to a single JS value" and "a collection that weakly points to its JS values" - the latter to me seems like a "WeakMap" except that it's the values, not the keys, that are weak - thus WeakValueMap.

A "FooFactory" to me is a function that when called, returns a "Foo" - so a "WeakFactory" would need to produce "Weaks".

daurnimator commented 6 years ago

My understanding is that there's a possible primitive for "weakly pointing to a single JS value" and "a collection that weakly points to its JS values" - the latter to me seems like a "WeakMap" except that it's the values, not the keys, that are weak - thus WeakValueMap.

There's also an additional useful primitive: a map with both weak keys and values (which in practice can be cheaper, as the GC can skip iterating the map at all when sweeping!)

ljharb commented 6 years ago

@daurnimator that's easy to create tho with a WeakMap + a WeakRef, no?

dtribble commented 6 years ago

As long as we are bikeshedding...

WeakFactory used to be WeakRefGroup, and it was about the creation and collection of a bunch of related weak references. However that name doesn't work once we added WeakCell (and it's not a collection class). I am not a big fan, and I agree that it may be too broad a name given that it doesn't have anything to do with WeakMap or WeakSet. So suggestions welcome :)

daurnimator commented 6 years ago

@daurnimator that's easy to create tho with a WeakMap + a WeakRef, no?

Yes it is; but such a construct would no-doubt have less than ideal performance.

All such things can be built on top of WeakRef once we have it; it's the true new primitive from a semantics perspective. (well.... unless we need to access the object for finalisation)

zenparsing commented 6 years ago

Thanks @dtribble for continuing to push this forward! I think there are some API tweaks that will make this more JS-like.

Looking at semantics though, why do we have WeakCell? I see that it has to do with the WASM use case but I'm unsure how it fits in. Apologies if this has been covered elsewhere.

tschneidereit commented 6 years ago

@zenparsing WeakCell helps the Wasm case by never creating a strong reference. Since WeakRefs keep a strong reference until the end of the turn after a) being constructed, and b) being dereferenced, they prevent the referenced object from being collected at all if the Wasm module never yields to the event loop. The WeakCell constructor doesn't create a strong reference, so it fixes a), and doesn't allow dereferencing, so fixes b).

The slides give at least some context to this.

zenparsing commented 6 years ago

@tschneidereit Thanks.

I understand why WeakRef creates a strong reference on dereference, but why does a WeakRef create a strong reference on construction?

Or, to put it another way, why do the arguments that lead us toward strong refs on construction of WeakRef not apply equally to WeakCell?

tschneidereit commented 6 years ago

Or, to put it another way, why do the arguments that lead us toward strong refs on construction of WeakRef not apply equally to WeakCell?

I guess the straight-forward answer is that this difference is a large part of the reason for introducing WeakCell in the first place :) I'll let @dtribble give a more substantive answer on why WeakRef creates a strong ref for the current turn on construction because I'm not entirely sure why that is necessary, either.

dtribble commented 6 years ago

WeakRef creates a strong pointer on creation to avoid revealing GC behavior. Here's a simple example

Within a turn:

let foo = new Foo(...);
let wr =  this.weakFactory.makeRef(foo, key);
foo = null;
// do some work

if (wr.deref() === undefined) {
  // PROFIT!
}

ljharb commented 6 years ago

@dtribble can you explain why, if "revealing GC behavior" happens with a WeakCell, it matters that WeakRef avoids revealing that?

dtribble commented 6 years ago

Revealing GC behavior is much more difficult with a WeakCell because you cannot do the test at the end.

The other reason we want weakRefs to be strong till the end of the turn is:

let wr = this.weakFactory.makeRef(new Foo(..), ...);
// do some stuff
return wr.deref();

In this pattern (which can emerge in reasonable code), the object is returned from deref, but could have gotten collected before that was called. I can argue that the programmer should have held onto that reference strongly elsewhere, but that's not always possible, and even good programmers don't always do that right. the consistency property just eliminates this possible footgun.

zenparsing commented 6 years ago

Revealing GC behavior is much more difficult with a WeakCell because you cannot do the test at the end.

Can you observe the collection of the WeakCell-wrapped object (within the turn) using the cleanup function?

dtribble commented 6 years ago

The automatic cleanup function gets scheduled in its own turn, so you cannot observe it that way during a turn.

You could potentially find out via cleanupSome.

That's what reminded me of the primary motivation, avoiding the failure mode in returning wr.deref() later.

zenparsing commented 6 years ago

You could potentially find out via cleanupSome.

If in-turn GC is observable by any means, then I think the motivation for creating a strong reference on construction is reduced; all that's left is the read consistency argument and that doesn't seem to apply since we don't have multiple "reads".

If we don't create a strong reference on construction, then it seems like WeakCell is equivalent to a WeakRef in which the user never calls deref. If so, then perhaps we can simplify things and just have WeakRef. Users that don't want to create "until end of turn" strong references could just opt not to call deref.

erights commented 6 years ago

If in-turn GC is observable by any means, then I think the motivation for creating a strong reference on construction is reduced;

Correct. The actual motivation is subtler. Even before we introduced WeakCell, we did not have true read consistency. Demonstration:

let foo = new Foo();
const wr = wrFactory.makeRef(foo);
//---- turn boundary ----
foo.bar();
foo = null;
//---- gc happens, reclaims foo ----
console.log(wr.deref());  // undefined

In all versions of our proposal, this violates read consistency because the use of foo.bar() observes by normal means that foo is still around. However, we would be insane to propose additional bookkeeping to keep track of this observation. Thus, for our approximation of read consistency, we only consider observations-of-non-reclamation through the "normal" operations of the weak reference system.

Without this insane extra bookkeeping, we never had the ability to fully prevent the observation of in-turn gc, and so never had full read consistency. So what were we ever accomplishing?

But first, some history. The hazard that blocked weakrefs for a long time is a sensible objection:

People will write code that is incorrect by the specification, but happens to work on all implementations and under testing because of the way unspecified timing just happens to work out. This code gets shipped. For totally innocent reasons, such as an improvement in gc algorithm, the timing changes and that code starts breaking. The normal browser game theory pathology kicks in: users don't punish the broken web site. Rather, they switch to a browser on which that site still works. The browser backs out of its gc improvement. There follows the normal nightmare that is too horrible to recount :/

What changed is that wasm's need for weakrefs was compelling enough for the browser makers to be willing to pay the costs above in order to have weak refs. There's no way for them to get the weakrefs they need without paying these costs. Having decided to pay these costs, this weakref proposal is the best way forward because it mitigates the most severe form of this hazard.

Our slides have examples like

wr.deref() && wr.deref().foo()

not because we think this is good code, but because we consider it inevitable that people will write code like this. This code will almost always happen to act correctly, so the problem likely won't be caught by testing. There are plenty of non-deterministic interleaving hazards that JS programmers cope with turn-to-turn, because turns are the transactional-like unit, with unpredictable interleaving of other turns interleaving between turns. (The promise design was designed to mitigate similar due to this unpredictable interleaving of turns.) So non-deterministic interleaving of the observable effects of gc at turn boundaries is the most tolerable form of this hazard.

Within a turn, programmers apply conventional sequential imperative intuitions, appropriate for code within a transactional-like unit. Of code patterns we expect to both be frequent, because they're using "normal" operations, and to often pass all tests. For code that happens to work reliably under such conditions, we'd like as much of that code as practical to actually be correct. Modulo other costs of course; it is a tradeoff.

So the similar example that I consider just as pressing is:

const wr = wrFactory.makeRef(foo());
console.log(wr.deref());

The problem is that both of these are very "normal" operations. cleanupSome is not. That's why makeRef needs to be considered an observation but makeCell does not.

erights commented 6 years ago

The talk and paper Uncanny Valleys in Declarative Language Design https://www.youtube.com/watch?v=hQ4Y-eAOZ-8 https://research.google.com/pubs/pub45983.html is about this "uncanny zone" of language design, of programs that are both intuitive and always happen to work correctly. Although Yedalog is an extraordinarily different language, our core response is still the same. When possible, change the spec and implementations so that such programs actually become correct.

zenparsing commented 6 years ago

@erights

This explanation was very helpful. I certainly agree that we should protect programmers from surprises in the case of:

wr.deref() && wr.deref().foo()

I'm slightly less convinced by:

const wr = wrFactory.makeRef(new Foo());
wr.deref().foo();

but I'm willing to assume that it's common enough.

My primary concern is the surface area of this new API. I was hopeful that there would be one new globally-scoped constructor, WeakRef, but with the new design we have WeakFactory, WeakCell, and WeakRef, and an inheritance relationship between WeakCell and WeakRef. That's quite a lot of conceptual overhead for a programmer that wants to create something simple, like a weakly-held list of event listeners.

I'm curious if there are other API options that would give us the same benefits. Would it be possible for WeakRefs to have a method that "re-weakens" them after construction or deref?

tc39 / proposal-weakrefs

Status update? #15