Open Spongman opened 8 years ago
Chrome issue (#638494) got merged feel days ago into upstream (https://chromium-review.googlesource.com/c/chromium/src/+/3347484).. Still, long road to stable/end-users, but it's something.
Getting a different behaviour on Canary, write(xx)
is throwing undefined
; digging into it to see if can squeeze something out of it although no console.log('user aborted')
being fired :/
Okay, below are my findings:
For Chrome, Canary version 99.0.4828.0
I was able to handle user-level cancellation by simply checking if write
would throw (not the best of the handlings but it does work).
Example:
const streamSaver = StreamSaver.createWriteStream(`abc.zip`);
this.fileStream = streamSaver.getWriter();
readable.on('data', (d) => {
if (this.cancelled) return;
this.fileStream.write(d)
.catch((e) => { this.cancelled = true; this.abort(); });
});
However, testing on Firefox, webworker does print user aborted
but nothing was implemented there.
To replicate the possible chrome implementation on firefox and possibly other browsers I had to send to the main thread the request to abort and then abort it as manual abort would do it.
@jimmywarting do you believe this commit could be merged or there are any other cases I'm not handling properly? https://github.com/gwdp/StreamSaver.js/commit/f9e375e6795a9b64049e08c47c6f7fda2e12b858
@gwdp hmm, yea maybe.
However I would like to rewrite the hole main-thread <-> service-worker
communication.
When I wrote my native-file-system-adapter then i took what I learned from StreamSaver and fixed the problems that I have had in StreamSaver.
...Then we use MessageChannel
to write each chunk to send it of to the service worker where i have created a ReadableStream
in order to save it. The current problem is that when we do: this.fileStream.write(d)
then it's immediately sent of to the service worker via PostMessage and the fileStream.write(d)
is resolved right away. This cause a problem cuz you do not know when/if it have been written so it can't handle the back pressure and you will start to have a memory problem if you write data faster then what it is able to write to the disk
So rather than pushing data from main thread to the service worker I instead built a ReadableStream
that pulls data from service worker and ask main thread for more data. That way you will know for sure that data has been written to the disk and the service worker is asking for more data when fileStream.write(d)
have been resolved.
The source is inspired by @MattiasBuelens remote-web-streams I would like to use this π it is a tighter glue between two worker threads and much more similar to a transferable stream is suppose to work and communicate back and forth with backpressure
edit: i wish he would have named it transferable-stream-shim or polyfill so it could be easier to google :P
Wow :exploding_head:. If I understood properly, that would involve a major refactor on the code (simplification as well), sounds like the next release of StreamSaver? :nerd_face:
I have been using StreamSaver for a while now and I found about native-file-system-adapter
only yesterday; After a good read on the code and its usage, I still believe stream saver has its own growing use case that is not entirely replaceable by native fs adapter.
In my case, I have been using it for compressing stuff before sending it to the fs, however on my to-do list is to customize the worker to make the compression there and not in the client, which is causing me a double backpressure problem when bandwidth spikes and disk/CPU are busy.
If your proposal is to use https://github.com/MattiasBuelens/remote-web-streams to refactor the communication, I might be able to draft something out in the upcoming week since I need to do something about this compression problem anyways..
For the download cancelled event issue I believe this will need to be handled anyways; Opening a PR for that so other folks can have this fixed in the current version :)
edit: i wish he would have named it transferable-stream-shim or polyfill so it could be easier to google :P
I know, I know. π Granted, I made that library before transferable streams were defined in the spec, so I didn't know what the spec would end up calling it.
Also, I'm hesitant to call it a "polyfill". For it to be a proper polyfill, it would need to patch postMessage()
and the message
event, and then transform the message
object to find and replace any streams that also appear in the transferList
. That means partially re-implementing StructuredSerializeInternal, so you can replace a ReadableStream
in a message like { deeply: { nested: { stream: readableStream } } }
. And I really couldn't be bothered with that. π
For Chrome, Canary version 99.0.4828.0 I was able to handle user-level cancellation by simply checking if write would throw (not the best of the handlings but it does work).
One way to check when close()
is called on writer, for example, in a different thread
try {
if (writable.locked) {
await writer.ready;
await writer.write(value);
}
} catch (e) {
console.warn(e.message);
}
if necessary handle cannot write to a closing writable stream error(s).
One hacky thing i'm doing in StreamSaver is transferering the MessagePort via another message channel to avoid the so called "Double transfer problem"
More context here: https://github.com/whatwg/streams/issues/276#issuecomment-482749288
It's essential for streamsaver to work. don't think remote-web-stream taking this into account?
You can re-transfer the returned MessagePort
as many times as you want, and then construct the ReadableStream
in the final realm.
// main.js
const readable = new ReadableStream({ /* ... */ });
const { writable, readablePort } = new RemoteWritableStream();
readable.pipeTo(writable).catch(() => {});
const worker1 = new Worker('./worker1.js');
worker1.postMessage({ readablePort }, [readablePort]);
// worker1.js
const worker2 = new Worker('./worker2.js');
self.onmessage = (event) => {
const { readablePort } = event.data;
worker2.postMessage({ readablePort }, [readablePort]);
}
// worker2.js
self.onmessage = (event) => {
const { readablePort } = event.data;
const readable = RemoteWebStreams.fromReadablePort(readablePort);
const reader = readable.getReader();
// ...
}
As long as you only call fromReadablePort()
once, it should be fine.
Ty for the instruction and the use of RemoteWebStreams.fromReadablePort
but isn't thatπ vulnerable to that "double transfer problem"?
i don't see any use of MessageChannel in that example, what happens when worker1 is terminated/closed?
@MattiasBuelens wouldn't it kind of be like this instead?
// To avoid double transfer problem
// 1. creates a `readablePort` that is posted to a MessageChannel
// 2. then send that MessageChannel to worker1 and then forward it to worker2
// 3. worker #2 listen for one message of the MessageChannel that includes the `readablePort`
// 4. worker #2 sets up a readable with `RemoteWebStreams.fromReadablePort(readablePort)`
// 5. worker1 can now be terminated
const readable = new ReadableStream({ /* ... */ })
const { writable, readablePort } = new RemoteWritableStream()
readable.pipeTo(writable).catch(console.error)
const bridge = new MessageChannel()
bridge.port1.postMessage({}, readablePort)
const worker1 = new Worker('./worker1.js')
worker1.postMessage({}, [bridge.port2])
// --------------------------------------------------------------------------------
// worker1.js (only forwards the MessageChannel bridge to worker2)
const worker2 = new Worker('./worker2.js')
globalThis.onmessage = evt => worker2.postMessage({}, [evt.ports[0]])
// --------------------------------------------------------------------------------
// worker2.js
globalThis.onmessage = evt => {
const [bridgePort] = evt.ports
// listen for one message on the BridgePort that will include the readablePort
bridgePort.onmessage = evt => {
// readablePort is now directly transferred from main thread to worker2 without
// being piped through worker1
const [readablePort] = evt.ports
const readable = RemoteWebStreams.fromReadablePort(readablePort)
const reader = readable.getReader()
// clean up
bridgePort.onmessage = null
bridgePort.close()
}
}
but isn't thatπ vulnerable to that "double transfer problem"? i don't see any use of MessageChannel in that example, what happens when worker1 is terminated/closed?
Actually, MessagePort
s can safely be re-transferred. When you transfer a port, you stop receiving messages in your original realm and the new realm can start receiving its messages instead. Even if the intermediate realm gets terminated, the port still works.
It doesn't really work with the example of worker1
spawning a new worker2
, because worker2
cannot outlive worker1
(terminating worker1
also terminates worker2
). But you can easily demonstrate this with passing a port back and forth between main
and worker
:
// main.js
const readable = new ReadableStream({
_i: 0,
pull(c) {
c.enqueue(++this._i);
if (this._i === 10) {
c.close();
}
}
});
const { writable, readablePort } = new RemoteWebStreams.RemoteWritableStream();
readable.pipeTo(writable).catch(console.error);
// Send the port to the worker
const worker = new Worker("./worker.js");
worker.postMessage({ readablePort }, [readablePort]);
// Receive the port back from the worker
worker.addEventListener('message', evt => {
// Terminate the worker
// This shows that the transferred port continues to work, and no longer "goes through" the worker
worker.terminate();
// Construct the transferred stream and consume it
const { readablePort } = evt.data;
const readable = RemoteWebStreams.fromReadablePort(readablePort);
readable.pipeTo(new WritableStream({
write(chunk) {
console.log('received', chunk);
}
}));
});
// worker.js
globalThis.onmessage = evt => {
// Send the port back to main thread.
const { readablePort } = evt.data;
globalThis.postMessage({ readablePort }, [readablePort]);
};
The "double transfer problem" with streams is slightly different. We always construct a new MessageChannel
when transferring a ReadableStream
, even if that stream was already transferred before. You can simulate this behavior with:
// worker.js
globalThis.onmessage = evt => {
// Construct a ReadableStream from the given port
const readable = RemoteWebStreams.fromReadablePort(evt.data.readablePort);
// Transfer the ReadableStream again
// This creates a new MessageChannel, independent from the original port
const { writable, readablePort } = new RemoteWebStreams.RemoteWritableStream();
// Pipe the two streams together. This is what "connects" the original port and the newly created port.
readable.pipeTo(writable).catch(console.error);
// Send the new message port back to main thread
globalThis.postMessage({ readablePort }, [readablePort]);
};
Here, it's not possible to terminate the worker, since its pipeTo()
is passing chunks received from the first port (evt.data.readablePort
) to the second port (readablePort
). That's the root cause of the double transfer problem.
In order to solve this, the streams specification needs to be changed such that, if a ReadableStream
was constructed through the "transfer-receiving steps", then the "transfer steps" should not set up a new MessageChannel
and pipeTo()
. Instead, it should re-use the original MessagePort
that we used to construct the stream. That way, we get the same behavior as in the first example, where the worker simply re-transfers the port. π
Of course, the streams specification has to deal with lots of other annoying details, such as "what if the worker first reads a couple of chunks from the stream and then re-transfers it?". remote-web-streams
doesn't attempt to solve that right now.
well, i'm not actually using two web workers... I have:
@gwdp For completeness when closing the writable side in the pattern release the lock as well for the if
condition to be true at that instance
await writer.close();
await writer.closed;
console.log('Writer closed.');
writer.releaseLock();
If a handler is attached where write()
is called multiple errors can be handled in the interim between handler being dispatched multiple times in the same span of time, though will still, in my case, enqueue all necessary chunks.
well, i'm not actually using two web workers...
I believe it should work regardless of whether you're using a worker, a service worker or an iframe. MessagePort
and postMessage()
should work the same in all of them.
But yes, with your current approach, you get the benefits of using native transferable streams, while also avoiding the double transfer problem. With remote-web-streams
, you wouldn't be able to use native transferable streams, you'd be manually passing message ports around instead.
Hopefully we can one day fix this double transfer problem in the spec, and then you can finally ditch your "message channel containing a single message with a readable stream" workaround too. π
once that is done i terminate the 2nd window by closing/removing it and then the stream gets closed or something. i'm not sure if it's a bug or if it's something that isn't spec:ed.
Yes, it is specified, to an appreciable degree https://github.com/w3c/ServiceWorker/issues/980. The ServiceWorker
has no WindowClient
or Worker
client at that point.
Depending on when the ServiceWorker
is constructed and how long the download takes persistent data can be lost, if not cached or stored somewhere outside of the ServiceWorker
. The fact that a MessageChannel
handler is used in ServiceWorker
and a ReadableStream
is being read from/written to in the ServiceWorker
has no impact on ServiceWorker
5 minute inactivity of events. If a variable is declared, the ServiceWorker
effectively deletes the variable and restarts again, disentangling all MessagePort
s. That is particularly problematic for extension code in Chromium/Chrome which uses MV3 ServiceWorker
and Native Messaging (https://bugs.chromium.org/p/chromium/issues/detail?id=1152255; https://bugs.chromium.org/p/chromium/issues/detail?id=1189678). If construction of ServiceWorker
and download of all content does not occur within 5 minutes data can be lost. If pausing and resuming a download is a requirement, still needs to be within the 5 minute span, else, again, without storing the data in some temporary container, the data can be lost. To avoid that you can
fetch()
request (e.g., to a media element) with user-defined ReadableStream
constructed by the client and transferred to ServiceWorker
with navigator.serviceWorker.active.postMessage()
, see https://github.com/w3c/ServiceWorker/issues/882EventSource
GET
request from client which will keep FetchEvent
handler active.See https://github.com/guest271314/persistent-serviceworker for solutions persist the same ServiceWorker
on both Firefox and Chrome.
once that is done i terminate the 2nd window by closing/removing it and then the stream gets closed or something. i'm not sure if it's a bug or if it's something that isn't spec:ed.
The simplest solution for that case, without changing too much of your existing code pattern, is to keep the <iframe>
in the document that created it and post a message to the ServiceWorker
every 15 seconds, something like
function closeMessageChannel() {
parent.postMessage('close', name);
}
navigator.serviceWorker.oncontrollerchange = async (e) => {
console.log(e);
closeMessageChannel();
};
navigator.serviceWorker.onmessage = async (e) => {
parent.postMessage(e.data, name, e.ports);
try {
while ((await navigator.serviceWorker.ready).active) {
(await navigator.serviceWorker.ready).active.postMessage(null);
await new Promise((resolve) => setTimeout(resolve, 1000 * 15));
}
closeMessageChannel();
} catch (err) {
console.error(err);
closeMessageChannel();
}
};
my hidden iframe (in the cases of when a webpage uses secure https) then the iframe is kept alive and pinging the service worker every now and then... but only if it dose not support transferable streams.
in that case a "real" transferable ReadableStream is sent to the service worker and responded with
evt.respondWith(new Response(transfered_readableStream))
, then we don't need any ping/keep-alive hack and the 5 Minutes lifetime dose not matter.
Then what issue are you having?
Is this
once that is done i terminate the 2nd window by closing/removing it and then the stream gets closed or something. i'm not sure if it's a bug or if it's something that isn't spec:ed.
not the case anymore?
@guest271314 Can we please stop talking about the double transfer problem now? it feels a bit of topic
in StreamSaver case the "double transfer problem" is only a issue when the site is insecure and the browser don't support transferable streams... when the site is insecure it will open up a secure site in a popup for a brief second only to transfer a transferable stream in best case scenario, worst case it must transfer a MessagePort to the service worker. after that the popup is closed automatically. This is the only time when we can't use postMessage hack to keep the service worker alive for a longer duration.
MattiasBuelens reproduced the double transfer problem quite well before in a minimal working example here: https://github.com/whatwg/streams/issues/276#issuecomment-482801869
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Re-transferring streams</title>
</head>
<body>
<script id="worker-script">
if (self.importScripts) {
self.onmessage = (ev) => {
const rs = ev.data;
self.postMessage(rs, [rs]);
};
}
</script>
<script>
var rs = new ReadableStream({
start(c) {
c.enqueue('a');
c.enqueue('b');
c.enqueue('c');
}
});
var workerCode = document.getElementById('worker-script').innerHTML;
var worker = new Worker(URL.createObjectURL(new Blob([workerCode], { type: 'application/javascript' })));
worker.onmessage = async (ev) => {
var transferred = ev.data;
worker.terminate(); // comment this line to "fix" the pipe
transferred.pipeTo(new WritableStream({
write(x) {
console.log(x);
},
close() {
console.log('closed');
}
}));
};
worker.postMessage(rs, [rs]);
</script>
</body>
</html>
which leads to this chain:
original ReadableStream <--> MessageChannel <--> worker's ReadableStream <--> MessageChannel <--> re-transferred ReadableStream
I have gotten around this problem by sending the message channel using something like i demonstrated above with MessageChannel here: https://github.com/jimmywarting/StreamSaver.js/issues/13#issuecomment-1013677071 in order to get a direct pipeline from point A to B and getting a shorter chain
i go a great length to get this StreamSaver to work in both secure and insecure pages and not having to force the developer to have to install a service worker them self.
That is why StreamSaver uses such a ugly hack of getting things to work. I wanted things to be as easy as just installing a dependency on StreamSaver and expect it to work with a simple import
from npm or a cdn
main thread <-> iframe <-> service worker
StreamSaver never installs a service worker on the developers domain and interferes with with existing service worker that they might have set up already. that's why i have a so called "man in the middle" iframe/page that makes it easy to use StreamSaver from insecure pages and sites like jsfiddle where you can't install a service worker or don't want to have to go to the trouble of having to install a service worker yourself.
but if the site is insecure then i can't install the service worker and communicate with it in any possible way
Well, yes, you can. The most secure route if for the user to run code on their own machine, which is possible with Native Messaging.
Can we please stop talking about the double transfer problem now? it feels a bit of topic
I'm just trying to provide a perspective from outside looking in. A ServiceWorker is not necessary at all to achieve downloading files locally.
Good luck.
Native Messaging isn't really a good option as it requires installing a extension and using native application and don't work in all browsers
I'm just trying to provide a perspective from outside looking in. A ServiceWorker is not necessary at all to achieve downloading files locally.
Yup,
<a href="blob:url" download="filename">
but this is what we are trying to avoid...Native Messaging isn't really a good option as it requires installing a extension and using native application and don't work in all browsers
Native Messaging works in Firefox and Chrome.
The user installs the extension locally. On Chromium simply select "Developer mode" at chrome://extensions then click "Load unpacked". No external resources or requests or Chrome Web Store are required.
Then you can use the same loading of <iframe>
approach with a local file listed in "web_accessible_resources"
and postMessage()
to the parent
, without using ServiceWorker
(or CDN) at all. If you do want to use ServiceWorker
for some reason, e.g., for user to click on the UI and commence download, you can still do that.
That allows you to use ServiceWorker
on any origin, securely, without necessarily being on an HTTPS page online.
If you are talking about insecure pages, and requesting CDN's, all of those concerns go away when you are only running code already on your own machine.
Alternatively, use the Native Messaging host to download directly to the users' local file system.
I already filed a PR for this capability. You appeared to think unpacking a local extension involves external code or requests, it does not, and is at least as secure as the MITM code and requesting CDN's you are already using.
The user installs the extension locally. On Chromium simply select "Developer mode" at chrome://extensions then click "Load unpacked". No external resources or requests or Chrome Web Store are required.
No visitor on your web site is ever going to do this... and you shouldn't force the user do this either
I think you misunderstand what I am conveying. I am not talking about visiting websites, or StreamSaver used by websites. I am talking about the end user that wants to use your StreamSaver code on their own machine. Native Messaging is useful for me and other users to, for example, stream speech synthesis engine output from local speech synthesis engine as a ReadableStream
piped to MediaStreamTrackGenerator
, or stream live system audio output to the user on any origin, which Chrome prevents both of the former by not outputting audio of speechSynthesis.speak()
on the tab, and not capturing monitor devices on *nix OS's.
From my perspetive StreamSaver concept is based on streaming download of files on any browser. So you can basically do the opposite of what I do here https://github.com/guest271314/captureSystemAudio/tree/master/native_messaging/capture_system_audio (using postMessage(transfer, [transfer])
for Firefox)
onload = () => {
const { readable, writable } = new TransformStream();
const writer = writable.getWriter();
const id = 'capture_system_audio';
let port = chrome.runtime.connectNative(id);
let handleMessage = async (value) => {
try {
if (writable.locked) {
await writer.ready;
await writer.write(new Uint8Array(JSON.parse(value)));
}
} catch (e) {
// handle cannot write to a closing stream
console.warn(e.message);
}
};
port.onDisconnect.addListener(async (e) => {
console.warn(e.message);
});
port.onMessage.addListener(handleMessage);
onmessage = async (e) => {
const { type, message } = e.data;
if (type === 'start') {
port.postMessage({
message,
});
parent.postMessage(readable, name, [readable]);
}
if (type === 'stop') {
try {
await writer.close();
await writer.closed;
console.log('Writer closed.');
writer.releaseLock();
port.onMessage.removeListener(handleMessage);
port.disconnect(id);
port = null;
parent.postMessage(0, name);
onmessage = null;
await chrome.storage.local.clear();
} catch (err) {
console.warn(err.message);
}
}
};
parent.postMessage(1, name);
};
Native Messaging allows you to deploy your concept locally, to your precise specification, without reliance on any extrnal code - for end users.
Perhaps I misunderstand and your code is written primarily for web site developers exlusively
StreamSaver.js is the solution to saving streams on the client-side. It is perfect for webapps
not individual users that want to run your gear locally on any site they happen to be visting.
Again, good luck.
is there a way to know if the download has been cancelled by the user?