TransformStream cleanup using "Transformer.cancel"

lucacasonato commented 1 year ago

This commit adds a cancel hook to Transformer. This allows users to perform resource cleanup when the readable side of the TransformStream is cancelled, or the writable side is aborted.

To preserve existing behavior, when the readable side is cancelled with a reason, the writable side is always immediately aborted with that same reason. The same is true in the reverse case. This means that the status of both sides is always either closed, erroring, or erroring when the cancel hook is called.

flush and cancel are never both called. As per existing behaviour, when the writable side is closed the flush hook is called. If the readable side is cancelled while a promise returned from flush is still pending, cancel is not called. In this scenario the readable side ends up in the "errored" state, while the writable side ends up in the closed state.

I have opted for a cancel hook instead of a finally hook, to mirror the API in WritableStream - it has one hook for successful completion (close), and one hook for errored completion (abort). Transformer already has a flush hook for successful completion. The logical addition is an cancel hook for errored completion.

Closes #1212

Open questions:

Is cancel the right name? Mirrors ReadableStream, but we use abort for WritableStream.
Is the immediate cancellation behaviour correct?
- Ie should call the cancel hook before or after cancelling the underlying streams?
- Calling it before means that current behaviour is changed (number of microticks from readable.cancel() to the writable actually being aborted). This is the current behaviour.
- Calling it after means that you can not modify the reason passed to the other side using the cancel hook. This is different to abort and cancel hooks in WritableStream and ReadableStream respectively.

[x] At least two implementers are interested (and none opposed):
- WebKit (https://github.com/whatwg/streams/pull/1283#issuecomment-1738663536)
- Chromium (https://github.com/whatwg/streams/pull/1283#issuecomment-1738960442)
- (Deno)
[x] Tests are written and can be reviewed and commented upon at:
- https://github.com/web-platform-tests/wpt/pull/40453
[x] Implementation bugs are filed:
- Chromium: https://bugs.chromium.org/p/chromium/issues/detail?id=1488142
- Gecko: https://bugzilla.mozilla.org/show_bug.cgi?id=1856103
- WebKit: https://bugs.webkit.org/show_bug.cgi?id=262424
- Deno: https://github.com/denoland/deno/issues/20741
- Node.js: https://github.com/nodejs/node/issues/49971
[x] MDN issue is filed: https://github.com/mdn/content/issues/29375

Preview | Diff

ricea commented 1 year ago

Is cancel the right name? Mirrors ReadableStream, but we use abort for WritableStream.

Since it's used for both the cancel and abort cases, it might be good to come up with a third name to avoid confusion. stop is one possibility.

Is the immediate cancellation behaviour correct?

Ie should call the cancel hook before or after cancelling the underlying streams?

Calling it before means that current behaviour is changed (number of microticks from readable.cancel() to the writable actually being aborted). This is the current behaviour.

Calling it after means that you can not modify the reason passed to the other side using the cancel hook. This is different to abort and cancel hooks in WritableStream and ReadableStream respectively.

Interesting. Calling it before would be better in terms of consistency with the other APIs. I personally think that changing the number of microticks is an acceptable breakage, but others may disagree.

lucacasonato commented 1 year ago

Interesting. Calling it before would be better in terms of consistency with the other APIs. I personally think that changing the number of microticks is an acceptable breakage, but others may disagree.

For reference, exactly one WPT is broken by this.

lucacasonato commented 1 year ago

Since it's used for both the cancel and abort cases, it might be good to come up with a third name to avoid confusion. stop is one possibility.

What about terminate? This aligns with controller.terminate

ricea commented 1 year ago

What about terminate? This aligns with controller.terminate

controller.terminate is rather a niche and subtle operation, which is why we gave it a long name.

This operation, however, is something we'd like people to implement if they need to do cleanup, so a short friendly name seems appropriate.

saschanaz commented 1 year ago

dispose? That's the name "owning" proposal uses by "dispose steps" and also by the upcoming Symbol.dispose proposal which might ultimately call this (if we implement it for stream classes)

lucacasonato commented 1 year ago

I don't particularly like dispose, as it implies it is called both on successful and errored close (because that is what Symbol.dispose does. This however is only called on the error case, because successful disposal is already handled through flush.

I think we need something that implies early unsuccessful termination. abort and cancel fit the bill here, except that we already use them elsewhere so using the same word may be confusing. I'm not super strongly concerned with confusion here tho, as the use case is very similar to those hooks in RS and WS.

lucacasonato commented 1 year ago

@domenic @MattiasBuelens @saschanaz Do you have any thoughts on the "Is the immediate cancellation behaviour correct?" point? I'm leaning towards the currently implemented behaviour because it minimizes chance of breakage, but I do also sympathize with @ricea's points above.

domenic commented 1 year ago

What about cancelOrAbort for the name? Since it's literally called when either the readable side is canceled, or the writable side is aborted? It's not short though.

I think calling it before is better. I agree that changing the number of microtasks is acceptable.

lucacasonato commented 1 year ago

I think calling it before is better.

Ok, I'll change that. This opens up a question on error handling:

const ts = new TransformStream({
  async cancel(reason) {
    console.log("t.cancel start with", reason);
    await delay(100);
    console.log("t.cancel end");
    throw "t.cancel error";
  },
});
console.log("ts constructed");
ts.readable.cancel("rs.cancel error")
  .then(() => console.log("rs.cancel fulfilled"))
  .catch((err) => console.log("rs.cancel rejected with", err));
ts.writable.abort("ws.abort error")
  .then(() => console.log("ws.abort fulfilled"))
  .catch((err) => console.log("ws.abort rejected with", err));
console.log("started rs.cancel and ws.abort");

I think the logical ordering of logs is:

# a-1
ts constructed
t.cancel start with rs.cancel error
started rs.cancel and ws.abort
t.cancel end
rs.cancel rejected with t.cancel error
ws.abort rejected with t.cancel error

But the following are also possible:

# a-2
ts constructed
t.cancel start with rs.cancel error
started rs.cancel and ws.abort
ws.abort fulfilled
t.cancel end
rs.cancel rejected with t.cancel error

# a-3
ts constructed
t.cancel start with rs.cancel error
started rs.cancel and ws.abort
t.cancel end
rs.cancel fulfilled
ws.abort fulfilled

Also consider:

const ts = new TransformStream({
  flush() { console.log("t.flush") },
  async cancel(reason) {
    console.log("t.cancel start with", reason);
    await delay(100);
    console.log("t.cancel end");
    throw "t.cancel error";
  },
});
console.log("ts constructed");
ts.readable.cancel("rs.cancel error")
  .then(() => console.log("rs.cancel fulfilled"))
  .catch((err) => console.log("rs.cancel rejected with", err));
ts.writable.close()
  .then(() => console.log("ws.close fulfilled"))
  .catch((err) => console.log("ws.close rejected with", err));
console.log("started rs.cancel and ws.close");

Here similar options are possible:

# b-1
ts constructed
t.cancel start with rs.cancel error
started rs.cancel and ws.close
t.cancel end
rs.cancel rejected with t.cancel error
ws.close rejected with t.cancel error

# b-2
ts constructed
t.cancel start with rs.cancel error
started rs.cancel and ws.close
ws.close fulfilled
t.cancel end
rs.cancel rejected with t.cancel error

# b-3
ts constructed
t.cancel start with rs.cancel error
started rs.cancel and ws.close
t.cancel end
rs.cancel fulfilled
ws.close fulfilled

These each also have effect on whether the error is visible in writer.closed or not. In a-1 and b-1 examples writer.closed would reject, in a-2, a-3, b-2, and b-3 the it resolves to undefined.

Does anyone have preferences here? I think a-1/b-1 make the most sense.

saschanaz commented 1 year ago

Does anyone have preferences here? I think a-1/b-1 make the most sense.

I fail to see why any of the resulting promises should not reject, so I also prefer a-1/b-1 (and I also think it's okay to have extra microtask).

lucacasonato commented 1 year ago

What about:

const ts = new TransformStream({
  flush() { console.log("t.flush") },
  async cancel(reason) {
    console.log("t.cancel start with", reason);
    await delay(100);
    console.log("t.cancel end");
    throw "t.cancel error";
  },
});
console.log("ts constructed");
ts.writable.close()
  .then(() => console.log("ws.close fulfilled"))
  .catch((err) => console.log("ws.close rejected with", err));
ts.readable.cancel("rs.cancel error")
  .then(() => console.log("rs.cancel fulfilled"))
  .catch((err) => console.log("rs.cancel rejected with", err));
console.log("started rs.cancel and ws.close");

ts constructed
t.flush
started ws.close and rs.cancel
ws.close fulfilled
rs.cancel fulfilled

I'm trying to figure out if it's observable from the outside that the RS was closed through a cancellation rather than a "clean close" (ie is "rs.cancel error" exposed through a reader.read() promise rejection)?

lucacasonato commented 1 year ago

I have now implemented a v2 that runs the cancel algorithm prior to cancel / abort of the other side. I have opted for the approach where all of rs.cancel ws.close and ws.abort will all either resolve or reject (never one resolves one rejects), and that the first one always wins.

https://github.com/whatwg/streams/pull/1283/commits/5b00d087808627517d5b35f765eaf5a1ecb2dc5b

This required updating some test expectations (the ones making the assumption that ws.abort and rs.cancel are sync, which due to micotask changes is not the case anymore).

https://github.com/web-platform-tests/wpt/pull/40453/commits/41567169c2658d99482a75214a60c575808e7ef2

PTAL!

lucacasonato commented 1 year ago

For the test part, could we replace await delay(0); with something else?

I could replace them with a fixed set of queueMicrotask, but I instead went with the approach used in other streams tests. It exists only to drain the current microtask queue before the next statement, not because the tests are timing sensitive.

lucacasonato commented 1 year ago

@ricea @domenic Could you take another look?

lucacasonato commented 1 year ago

I have one editor approval - does anyone else still want to take a look, or is this good to merge? @MattiasBuelens do we usually do a "final comment period" for stream spec PRs to help keep things moving?

domenic commented 1 year ago

This needs implementer interest before it's ready to merge. Although I trust @MattiasBuelens , I and other editors might be holding off on further review until we're clear on whether this has a cross-browser future.

lucacasonato commented 1 year ago

Ok, would be great to start getting some explicit feedback on interest from Chrome / Firefox / Safari then.

I would urge folks to look at this with some urgency - it's a very serious omission in TransformStream that makes implementations of many transformer algorithms impossible unless cleanup is performed exclusively via GC. There is significant user interest, both from us at Deno, from folks using Cloudflare Workers (cc @jasnell), and from other interested parties (see the original issue #1212).

@domenic is Chrome interested in shipping this? It sounded in this issue as if you may be interested? Looking it over, it seems like a relatively trivial implementation for Chromium :)

@saschanaz can I note down your LGTM as "interested in shipping" for Gecko?

@annevk is WebKit interested? (if you are the wrong person to ping, please forward :) )

saschanaz commented 1 year ago

LGTM spec-wise, but not actively pursuing implementation-wise. If others are interested to ship it then Mozilla will too.

domenic commented 1 year ago

@ricea is the relevant Chromium implementer

lucacasonato commented 1 year ago

@ricea Is Chrome interested in shipping this?

youennf commented 1 year ago

This addition makes sense to me as we have more and more web platform objects that benefit from being explicitly closed.

ricea commented 1 year ago

Yes, Chrome is interested. Sorry for the delay.

lucacasonato commented 1 year ago

Fantastic, thank you folks! I'll open implementation bugs shortly so we can get this landed 😃

lucacasonato commented 1 year ago

All implementation bugs and the MDN issue have been filed, and the WPT submodule has been updated. This PR is now ready to land :)

domenic commented 1 year ago

I think all your issues have the wrong titles, using "cleanup" instead of "cancel" :). I'll merge this though.

lucacasonato commented 1 year ago

@domenic The cancel hook allows for cleanup of transformers backing TransformStreams, so I think it makes some sense?

domenic commented 1 year ago

There is no "transformer.cleanup" property though.

lucacasonato commented 1 year ago

Oh I realize my mistake now. Will fix. Thx

whatwg / streams

TransformStream cleanup using "Transformer.cancel" #1283