Allow for garbage collection of cancellation handlers on long-lived tokens

bergus commented 8 years ago

@stefanpenner raised this (I guess) in the July meeting but it was deferred.

I'll use the xhrAdapted and delay function from the docs. Let's assume the following code:

async function reload(el, url, token) {
    await.cancelToken = token;
    while (true) {
        el.innerHTML = await xhrAdapted(url, token);
        await delay(5000, token);
    }
}
reload(document.getElementById("news"), "ticker.html", new CancelToken(…));

With the current proposal, this code will leak memory like crazy. On every iteration of the loop, no less than 4 callbacks are registered on token.promise, and they are never garbage-collected until cancel is called (which might be never) or the token is collected itself. Two of the callbacks are closing over xhr and id respectively, keeping all of the created request and timer objects as well.

If the promisifications of XHR and timeout would set all closure variables to null explicitly after called resolve/reject, and doing nothing in the token callback when they are tested to be null, the memory could be theoretically released. However, this is a rather fragile approach, it's error-prone (easy to forget), and to collect the 4 callbacks when they are proven to do nothing any more would require heavy (and afaik yet unknown) engine sophistication.

For this to work out, we need a way unsubscribe callbacks, from tokens at least and possibly from promises in general. Let's not be too radical and keep it limited to the former.

Imo it is imperative that Promise.cancellablewithCancelToken will unsubscribe the callback from the token (so that it no longer is called on a cancellation request) when the resulting promise is resolved. Whether that should possibly be extended until the promise is settled, not only resolve/reject have been called, remains to be discussed.

I am not sure to what extent the spec should cover this garbage collection. A very simple fix would be to insert a new step

If reject.[[AlreadyResolved]].[[value]] is true, return.

between steps 2 and 3 in §1.3.1.1 Promise.withCancelToken Cancelation Reaction Functions. Then we'd leave it to the engines to figure out when callbacks are supposed to be collectable.
A better idea would be to explicitly remove the callback from the [[CancelTokenPromise]].[[PromiseFulfillReactions]] to make the intent clear. I would model that by adding the PromiseReaction that is registered on the cancellation token to the [[AlreadyResolved]] record of the promise resolving functions (via CreateResolvingFunctions), and then in those function after the [[Value]] test set not only the [[Value]] to false, but also the [[AlreadyResolved]].[[CancelationReaction]].[[Capability]] to empty and the [[AlreadyResolved]].[[CancelationReaction]].[[Handler]] to undefined.

domenic commented 8 years ago

The important thing to remember is that browsers are free to make any of the sort of unobservable optimizations you suggest. They don't belong in the spec.

domenic commented 8 years ago

Closing, but will certainly reopen if I misunderstood and the optimizations you suggest require observable semantic changes.

bergus commented 8 years ago

Yes, please reopen. The additional step I suggested does change the observable behaviour to make the very important garbage collection possible in the first place.

function exanple(cancelTime, resolveTime) 
    return Promise.withCancelToken(new CancelToken(cancel => {
        setTimeout(cancel, cancelTime, "over");
    }), (resolve, reject) => {
        setTimeout(resolve, resolveTime, "done");
        return (e => console.log("It's "+e.message));
   });
}

While example(1000, 2000) (where cancellation happens before resolution) will remain unaffected and log "It's over" after 1s, the behaviour of

example(2000, 1000).then(r => console.log("It's "+r))

would change. When the token is cancelled after the promise is resolved, the cancel action is no more executed (because there usually is nothing to cancel any more). So instead of logging "It's done" and "It's over" after 1 and 2s respectively, it should only log "It's done".

domenic commented 8 years ago

OK. I am going to rename this then since it sounds like your issue is not about memory leaks but instead about Promise.withCancelToken sometimes executing its cancelation action twice.

bergus commented 8 years ago

No, the cancel action is not executed twice. It's just executed after all work is done, when there is nothing to cancel any more. And this really does create a memory leak - try my example and monitor the number of callbacks on token.promise.

domenic commented 8 years ago

I see, you just used "It's " + r in both.

The number of callbacks on token.promise is a spec-internal device and does not govern how much memory will be consumed. Callbacks which the implementation knows will never fire are not going to be retained.

I guess there may be a remaining semantic problem with Promise.withCancelToken in edge cases, potentially, but it sounds like there is a misunderstanding of how the memory model works here. I can investigate the Promise.withCancelToken behavior.

bergus commented 8 years ago

Callbacks which the implementation knows will never fire are not going to be retained.

Yeah, one can only hope so. The problem is really that they will fire, they just don't do anything observable. Detecting that is much harder.

And no, this is not an edge case. The problem is that cancel actions (the functions installed on token.promise) will be retained (and executed!) even after the action they could have cancelled is finished, as long as the token lives longer than the single promise - which is pretty much always.

domenic commented 8 years ago

It's not hard if it's written into the spec that they won't do anything observable.

bergus commented 8 years ago

Sure, but writing the spec in that way makes it hard to see that they need to be garbage-collected. You didn't realise the importance of this problem immediately, or didn't even think about it in the first place, so it might be doubted that implementors will. Compare this to the (Weak)Map/(Weak)Set section of the spec which talks about the intended garbage collection behaviour extensively. So if you choose to pick up my first suggested change you should probably add a Note to it that not executing the callback is done to make it garbage-collectable even before the token.

domenic commented 8 years ago

Sure, we can try to add an note, although the committee has in the past preferred to not include such things, so my guess is that has a good chance of being removed after committee review.

benjamingr commented 8 years ago

It's important to note that the spec doesn't typically specify cleanup or garbage collection anywhere anyway. The term "garbage collection" does not appear in the spec even once and there are in fact (typically older, or run once) implementations of JavaScript that do not collect garbage at all.

The spec of WeakMap/WeakSet doesn't mention garbage collection at all - weak maps just have constraints (like no primitive keys and no iteration) that allow implementors to make the map week. While it is obvious that this is the intended behavior the spec does not actually require it anywhere from what I can tell.

stefanpenner commented 8 years ago

I have a question, @domenic is it intended that leak free tokens cannot be implemented in user-land (always requiring platform magic to not retain handle callbacks)?

For example, Token.race could be approximately implemented in user-land as:

Token.race = function(tokens) {
  let cancel;
  const token = new this(_cancel => cancel = _cancel);

  tokens.forEach(token => token.promise.then(cancel));

  return token;
};

Token.race([short, long]);
// cancelShort

This will leak the handler added to long.promise via long.promise.then(cancel),

related note from the spec:

If operation is "handle", an implementation should not hold a reference to promise in a way that would interfere with garbage collection. An implementation may hold a reference to promise if operation is "reject", since it is expected that rejections will be rare and not on hot code paths.

domenic commented 8 years ago

In general it's anticipated that userland implementations will not be able to accomplish all of the same semantics as built-in ones, yes.

stefanpenner commented 8 years ago

@domenic although not perfect, I think does seem possible for polyfills to implement the Token.race in a leak free way, at-least when their own promises are used.

Token.race = function(tokens) {
  let cancel;
  const token = new this(_cancel => cancel = _cancel);

  const detachableReactions = tokens.map(token => {
    // something a user-land polyfill could do..
    return $$$_attemptThenAndCaptureDetachableReaction(token.promise, cancel);
  });

  token.promise.then(() => detachableReaction.forEach(r => r.detach());

  return token;
};

Kinda icky, and also demonstrates that mixing user-land and native stuff here will also cause leaks.

Although icky, in-general a cancellation primitive even with the above semantics would very much be a net positive in apps I have worked on 👍

Question: can we do better?

bergus commented 8 years ago

Now that is an excellent point why we need Token.race as a builtin :-) It would not only leak the cancel hander but also the constructed token that the handler can cancel. An engine would need to determine when cancel does no more have any observable side effects (that is when token.promise has ~~no handlers any more~~ no handlers that do anything) and then drop it. And token would need to listen to the tokens again when there is a new handler added. I guess an engine could achieve this even for a userland promise.then(cancel) by detecing a canceller function, but for promise.then(e => cancel(e)) it will become near impossible.

Btw, that "DetachableReaction" for arbitrary promises is the idea on which my proposal was based on.

stefanpenner commented 8 years ago

Btw, that "DetachableReaction" for arbitrary promises is the idea on which my proposal was based on.

Ya in an earlier spike I also ended-up with something along the lines of DetachableReaction. Although implemented as a non-promise construct... Ultimately that path also ran into issues, ones that promises on tokens did end up resolving. That approach did deal with the leak issue. Its tricky getting something perfect here.

domenic commented 8 years ago

Unfortunately @bergus has continued to use this repo as a way to push his proposal, so I'm going to have to moderate this thread. Please keep the discussion focused on this repository.

benjamingr commented 8 years ago

Userland libraries don't leak here, only native promises + polyfilled tokens. Even then, it's just latency and not a leak and I don't think it's a big deal since the expensive part (the promises) were taken care of.

I don't think this "leak" is a big deal at all.

bergus commented 8 years ago

@benjamingr Even userland libraries need to go to great lengths not to leak here, I've not yet seen a token implementation that doesn't. Which of the leaks do you think is not a problem? We've described two different scenarios here.

Btw, with the semantics I described in the OP there would be a way to detach reactions from the token promise, it's just a bit convoluted:

Token.race = function(tokens) {
  const {token, cancel} = this.source();

  function onCancelled(e) {
    for (const detach of detachers)
      detach();
    cancel(e);
  }
  const detachers = tokens.map(t => {
    let resolve;
    Promise.withCancelToken(t, res => { // this subscribes `onCancelled` on `t`
      resolve = res;                    // and drops it when `resolve` is called
      return onCancelled;
    });
    return resolve;
  });
  return token;
};

However, this only solves the case where the short-lived tokens are cancelled regularly without the long-lived one being affected. It does still leak when the race token is not cancelled but has no subscribers (that do anything).

stefanpenner commented 8 years ago

it's just latency and not a leak

can you explain this in more detail?

stefanpenner commented 8 years ago

@domenic you got me thinking, if we had the concept of a WeakArray or WeakList it would allow user-land to implement many interesting things. Unfortunately, I think it would expose GC semantics to user-space, which doesn't sound like a good idea.

It is good that a solution to the leak I am concerned about is possible for native implementations at-least. As it stands it still doesn't seem right, but I'll noodle on it more and try to come up with something concrete.

Userland libraries don't leak here, only native promises + polyfilled tokens.

@benjamingr I don't believe this to be true, as todays user-land promises don't have a weak reference to its enqueued fulfillment reaction value's (the success handlers). The spec in this proposal aims to address this by recommending implementations "weakly" retain these. This mitigates the handler leak problem, but I don't believe can be implemented in all cases in user-land.

bergus commented 8 years ago

@stefanpenner I had that in mind as well, but I don't think a WeakArray or WeakRef can solve this problem, a token should still have a strong reference to its handlers. They should not be collected (and not called) when nothing else references them any more, they should be dropped (and not called) when their execution would not have any observable side effects. This could be achieved by very sophisticated program analysis (unlikely), infererenced for some special cases by the programmer (expected for native implementations), or done via explicit signaling ("detach", "unsubscribe", "show disinterest") that would be usable also for userland implementations.

stefanpenner commented 8 years ago

@bergus could you provide a quick concrete example where the Weak relationship breaks down? I haven't given it much thought, and as you have I would love to see an example or two.

bergus commented 8 years ago

@stefanpenner

function example(act) {
    const {token, cancel} = CancelToken.source();
    token.promise.then(act); // some side effect (strongly referenced)
    const a = CancelToken.source(),
          b = CancelToken.source();
    a.token.promise.then(cancel); // strong reference from `a.token` to `cancel` (and `token` and `act`)
    b.token.subscribeWeak(cancel); // weak reference from `b.token` to `cancel` (etc)
    return {cancelA: a.cancel, cancelB: b.cancel}; // notice the cancel function reference the respective tokens
}

Now the following would work well

({cancelA, cancelB} = example(()=>console.log("executed"))); // make them global variables for simplicity
// later
cancelA() // will cause the log
cancelA = null; // destroy, now `a.token`, `cancel` and `token` can be collected
                // notice how the latter two were only weakly referenced by `cancelB`
// later
cancelB(); // a no-op now

but the following would go haywire:

({cancelA, cancelB} = example(()=>console.log("maybe")));
// later
cancelA = null; // destroy, now `a.token`, `cancel`, `token` and `act` can be collected
                // notice how the latter were only weakly referenced by `cancelB`
// later
// in the meantime, the garbage collector might have run, or not
cancelB(); // *should* cause the log, right? But maybe `act` is already destroyed

In general, we cannot weakly reference event handlers, as the event source is often the only object holding a reference to them, so it could as well drop them right away. Weak references in FRP systems only work because the output nodes are referenced from root objects and have strong references to their dependencies.

zenparsing commented 7 years ago

I'm not sure that the change to withCancelToken fixes this issue, since the semantics of cancelToken.promise.then imply storing strong references to promise handlers. For a cancel token which is long-lived, the promise handler will then far outlive the async operation that registers the handler.

It seems to me that providing some way to deregister a cancellation listener is a design requirement. Thoughts?

bergus commented 7 years ago

@zenparsing

the semantics of cancelToken.promise.then imply storing strong references to promise handlers.

Yes, that's exactly the problem. The fix to withCancelToken was small but bears heavy implications - implementors are expected to safely drop the references to the promise's then handlers when they are guaranteed not to be executed any more (which is what the change asserts).

I agree on deregistering being a design requirement, I guess this should be stated more clearly (and the proposal be re-evaluated in light of that). It remains an open question whether this deregistration should be used internally (withCancelToken, CancelToken.race) only, or whether it should be (needs to be?) exposed in the API. Imo, all then callbacks should be cancellable (deregisterable).

tc39 / proposal-cancelable-promises

Allow for garbage collection of cancellation handlers on long-lived tokens #52