jimmywarting / StreamSaver.js

StreamSaver writes stream to the filesystem directly asynchronous
https://jimmywarting.github.io/StreamSaver.js/example.html
MIT License
4.02k stars 417 forks source link

download cancelled event? #13

Open Spongman opened 8 years ago

Spongman commented 8 years ago

is there a way to know if the download has been cancelled by the user?

gwdp commented 2 years ago

Chrome issue (#638494) got merged feel days ago into upstream (https://chromium-review.googlesource.com/c/chromium/src/+/3347484).. Still, long road to stable/end-users, but it's something.

Getting a different behaviour on Canary, write(xx) is throwing undefined; digging into it to see if can squeeze something out of it although no console.log('user aborted') being fired :/

gwdp commented 2 years ago

Okay, below are my findings:

For Chrome, Canary version 99.0.4828.0 I was able to handle user-level cancellation by simply checking if write would throw (not the best of the handlings but it does work). Example:

      const streamSaver = StreamSaver.createWriteStream(`abc.zip`);
      this.fileStream = streamSaver.getWriter();
      readable.on('data', (d) => {
          if (this.cancelled) return;
          this.fileStream.write(d)
                                   .catch((e) => { this.cancelled = true; this.abort(); });
    });

However, testing on Firefox, webworker does print user aborted but nothing was implemented there. To replicate the possible chrome implementation on firefox and possibly other browsers I had to send to the main thread the request to abort and then abort it as manual abort would do it.

@jimmywarting do you believe this commit could be merged or there are any other cases I'm not handling properly? https://github.com/gwdp/StreamSaver.js/commit/f9e375e6795a9b64049e08c47c6f7fda2e12b858

jimmywarting commented 2 years ago

@gwdp hmm, yea maybe.

However I would like to rewrite the hole main-thread <-> service-worker communication. When I wrote my native-file-system-adapter then i took what I learned from StreamSaver and fixed the problems that I have had in StreamSaver.

When readable streams are not transferable...

...Then we use MessageChannel to write each chunk to send it of to the service worker where i have created a ReadableStream in order to save it. The current problem is that when we do: this.fileStream.write(d) then it's immediately sent of to the service worker via PostMessage and the fileStream.write(d) is resolved right away. This cause a problem cuz you do not know when/if it have been written so it can't handle the back pressure and you will start to have a memory problem if you write data faster then what it is able to write to the disk

So rather than pushing data from main thread to the service worker I instead built a ReadableStream that pulls data from service worker and ask main thread for more data. That way you will know for sure that data has been written to the disk and the service worker is asking for more data when fileStream.write(d) have been resolved.

The source is inspired by @MattiasBuelens remote-web-streams I would like to use this πŸ‘† it is a tighter glue between two worker threads and much more similar to a transferable stream is suppose to work and communicate back and forth with backpressure

edit: i wish he would have named it transferable-stream-shim or polyfill so it could be easier to google :P

gwdp commented 2 years ago

Wow :exploding_head:. If I understood properly, that would involve a major refactor on the code (simplification as well), sounds like the next release of StreamSaver? :nerd_face:

I have been using StreamSaver for a while now and I found about native-file-system-adapter only yesterday; After a good read on the code and its usage, I still believe stream saver has its own growing use case that is not entirely replaceable by native fs adapter. In my case, I have been using it for compressing stuff before sending it to the fs, however on my to-do list is to customize the worker to make the compression there and not in the client, which is causing me a double backpressure problem when bandwidth spikes and disk/CPU are busy. If your proposal is to use https://github.com/MattiasBuelens/remote-web-streams to refactor the communication, I might be able to draft something out in the upcoming week since I need to do something about this compression problem anyways..

For the download cancelled event issue I believe this will need to be handled anyways; Opening a PR for that so other folks can have this fixed in the current version :)

MattiasBuelens commented 2 years ago

edit: i wish he would have named it transferable-stream-shim or polyfill so it could be easier to google :P

I know, I know. πŸ˜› Granted, I made that library before transferable streams were defined in the spec, so I didn't know what the spec would end up calling it.

Also, I'm hesitant to call it a "polyfill". For it to be a proper polyfill, it would need to patch postMessage() and the message event, and then transform the message object to find and replace any streams that also appear in the transferList. That means partially re-implementing StructuredSerializeInternal, so you can replace a ReadableStream in a message like { deeply: { nested: { stream: readableStream } } }. And I really couldn't be bothered with that. πŸ˜…

guest271314 commented 2 years ago

For Chrome, Canary version 99.0.4828.0 I was able to handle user-level cancellation by simply checking if write would throw (not the best of the handlings but it does work).

One way to check when close() is called on writer, for example, in a different thread

try {
  if (writable.locked) {
    await writer.ready;
    await writer.write(value);
  }
} catch (e) {
  console.warn(e.message);
}

if necessary handle cannot write to a closing writable stream error(s).

jimmywarting commented 2 years ago

One hacky thing i'm doing in StreamSaver is transferering the MessagePort via another message channel to avoid the so called "Double transfer problem"

More context here: https://github.com/whatwg/streams/issues/276#issuecomment-482749288

It's essential for streamsaver to work. don't think remote-web-stream taking this into account?

MattiasBuelens commented 2 years ago

You can re-transfer the returned MessagePort as many times as you want, and then construct the ReadableStream in the final realm.

// main.js
const readable = new ReadableStream({ /* ... */ });
const { writable, readablePort } = new RemoteWritableStream();
readable.pipeTo(writable).catch(() => {});
const worker1 = new Worker('./worker1.js');
worker1.postMessage({ readablePort }, [readablePort]);

// worker1.js
const worker2 = new Worker('./worker2.js');
self.onmessage = (event) => {
  const { readablePort } = event.data;
  worker2.postMessage({ readablePort }, [readablePort]);
}

// worker2.js
self.onmessage = (event) => {
  const { readablePort } = event.data;
  const readable = RemoteWebStreams.fromReadablePort(readablePort);
  const reader = readable.getReader();
  // ...
}

As long as you only call fromReadablePort() once, it should be fine.

jimmywarting commented 2 years ago

Ty for the instruction and the use of RemoteWebStreams.fromReadablePort but isn't thatπŸ‘† vulnerable to that "double transfer problem"? i don't see any use of MessageChannel in that example, what happens when worker1 is terminated/closed?

jimmywarting commented 2 years ago

@MattiasBuelens wouldn't it kind of be like this instead?


// To avoid double transfer problem
// 1. creates a `readablePort` that is posted to a MessageChannel
// 2. then send that MessageChannel to worker1 and then forward it to worker2 
// 3. worker #2 listen for one message of the MessageChannel that includes the `readablePort`
// 4. worker #2 sets up a readable with `RemoteWebStreams.fromReadablePort(readablePort)`
// 5. worker1 can now be terminated

const readable = new ReadableStream({ /* ... */ })
const { writable, readablePort } = new RemoteWritableStream()
readable.pipeTo(writable).catch(console.error)

const bridge = new MessageChannel()
bridge.port1.postMessage({}, readablePort)

const worker1 = new Worker('./worker1.js')
worker1.postMessage({}, [bridge.port2])

// --------------------------------------------------------------------------------

// worker1.js (only forwards the MessageChannel bridge to worker2)
const worker2 = new Worker('./worker2.js')
globalThis.onmessage = evt => worker2.postMessage({}, [evt.ports[0]])

// --------------------------------------------------------------------------------

// worker2.js
globalThis.onmessage = evt => {
  const [bridgePort] = evt.ports
  // listen for one message on the BridgePort that will include the readablePort
  bridgePort.onmessage = evt => {
    // readablePort is now directly transferred from main thread to worker2 without
    // being piped through worker1 
    const [readablePort] = evt.ports
    const readable = RemoteWebStreams.fromReadablePort(readablePort)
    const reader = readable.getReader()

    // clean up
    bridgePort.onmessage = null
    bridgePort.close()
  }
}
MattiasBuelens commented 2 years ago

but isn't thatπŸ‘† vulnerable to that "double transfer problem"? i don't see any use of MessageChannel in that example, what happens when worker1 is terminated/closed?

Actually, MessagePorts can safely be re-transferred. When you transfer a port, you stop receiving messages in your original realm and the new realm can start receiving its messages instead. Even if the intermediate realm gets terminated, the port still works.

It doesn't really work with the example of worker1 spawning a new worker2, because worker2 cannot outlive worker1 (terminating worker1 also terminates worker2). But you can easily demonstrate this with passing a port back and forth between main and worker:

// main.js
const readable = new ReadableStream({
  _i: 0,
  pull(c) {
    c.enqueue(++this._i);
    if (this._i === 10) {
      c.close();
    }
  }
});
const { writable, readablePort } = new RemoteWebStreams.RemoteWritableStream();
readable.pipeTo(writable).catch(console.error);

// Send the port to the worker
const worker = new Worker("./worker.js");
worker.postMessage({ readablePort }, [readablePort]);

// Receive the port back from the worker
worker.addEventListener('message', evt => {
  // Terminate the worker
  // This shows that the transferred port continues to work, and no longer "goes through" the worker
  worker.terminate();
  // Construct the transferred stream and consume it
  const { readablePort } = evt.data;
  const readable = RemoteWebStreams.fromReadablePort(readablePort);
  readable.pipeTo(new WritableStream({
    write(chunk) {
      console.log('received', chunk);
    }
  }));
});

// worker.js
globalThis.onmessage = evt => {
  // Send the port back to main thread.
  const { readablePort } = evt.data;
  globalThis.postMessage({ readablePort }, [readablePort]);
};

The "double transfer problem" with streams is slightly different. We always construct a new MessageChannel when transferring a ReadableStream, even if that stream was already transferred before. You can simulate this behavior with:

// worker.js
globalThis.onmessage = evt => {
  // Construct a ReadableStream from the given port
  const readable = RemoteWebStreams.fromReadablePort(evt.data.readablePort);
  // Transfer the ReadableStream again
  // This creates a new MessageChannel, independent from the original port
  const { writable, readablePort } = new RemoteWebStreams.RemoteWritableStream();
  // Pipe the two streams together. This is what "connects" the original port and the newly created port.
  readable.pipeTo(writable).catch(console.error);
  // Send the new message port back to main thread
  globalThis.postMessage({ readablePort }, [readablePort]);
};

Here, it's not possible to terminate the worker, since its pipeTo() is passing chunks received from the first port (evt.data.readablePort) to the second port (readablePort). That's the root cause of the double transfer problem.

In order to solve this, the streams specification needs to be changed such that, if a ReadableStream was constructed through the "transfer-receiving steps", then the "transfer steps" should not set up a new MessageChannel and pipeTo(). Instead, it should re-use the original MessagePort that we used to construct the stream. That way, we get the same behavior as in the first example, where the worker simply re-transfers the port. πŸ˜‰

Of course, the streams specification has to deal with lots of other annoying details, such as "what if the worker first reads a couple of chunks from the stream and then re-transfers it?". remote-web-streams doesn't attempt to solve that right now.

jimmywarting commented 2 years ago

well, i'm not actually using two web workers... I have:

  1. The main thread
  2. An 3th party hidden iframe or a visible popup, it's sole purpose is to install a service worker (only the popup gets destroyed as soon as the port have been sent to the service worker) (can call this iframe or popup for worker1)
  3. A service worker (we could call this worker2)
guest271314 commented 2 years ago

@gwdp For completeness when closing the writable side in the pattern release the lock as well for the if condition to be true at that instance

    await writer.close();
    await writer.closed;
    console.log('Writer closed.');
    writer.releaseLock();

If a handler is attached where write() is called multiple errors can be handled in the interim between handler being dispatched multiple times in the same span of time, though will still, in my case, enqueue all necessary chunks.

MattiasBuelens commented 2 years ago

well, i'm not actually using two web workers...

I believe it should work regardless of whether you're using a worker, a service worker or an iframe. MessagePort and postMessage() should work the same in all of them.

But yes, with your current approach, you get the benefits of using native transferable streams, while also avoiding the double transfer problem. With remote-web-streams, you wouldn't be able to use native transferable streams, you'd be manually passing message ports around instead.

Hopefully we can one day fix this double transfer problem in the spec, and then you can finally ditch your "message channel containing a single message with a readable stream" workaround too. πŸ™

guest271314 commented 2 years ago

once that is done i terminate the 2nd window by closing/removing it and then the stream gets closed or something. i'm not sure if it's a bug or if it's something that isn't spec:ed.

Yes, it is specified, to an appreciable degree https://github.com/w3c/ServiceWorker/issues/980. The ServiceWorker has no WindowClient or Worker client at that point.

Depending on when the ServiceWorker is constructed and how long the download takes persistent data can be lost, if not cached or stored somewhere outside of the ServiceWorker. The fact that a MessageChannel handler is used in ServiceWorker and a ReadableStream is being read from/written to in the ServiceWorker has no impact on ServiceWorker 5 minute inactivity of events. If a variable is declared, the ServiceWorker effectively deletes the variable and restarts again, disentangling all MessagePorts. That is particularly problematic for extension code in Chromium/Chrome which uses MV3 ServiceWorker and Native Messaging (https://bugs.chromium.org/p/chromium/issues/detail?id=1152255; https://bugs.chromium.org/p/chromium/issues/detail?id=1189678). If construction of ServiceWorker and download of all content does not occur within 5 minutes data can be lost. If pausing and resuming a download is a requirement, still needs to be within the 5 minute span, else, again, without storing the data in some temporary container, the data can be lost. To avoid that you can

See https://github.com/guest271314/persistent-serviceworker for solutions persist the same ServiceWorker on both Firefox and Chrome.

Screenshot_2022-01-14_00-05-47

guest271314 commented 2 years ago

once that is done i terminate the 2nd window by closing/removing it and then the stream gets closed or something. i'm not sure if it's a bug or if it's something that isn't spec:ed.

The simplest solution for that case, without changing too much of your existing code pattern, is to keep the <iframe> in the document that created it and post a message to the ServiceWorker every 15 seconds, something like

function closeMessageChannel() {
  parent.postMessage('close', name);
}
navigator.serviceWorker.oncontrollerchange = async (e) => {
  console.log(e);
  closeMessageChannel();
};
navigator.serviceWorker.onmessage = async (e) => {
  parent.postMessage(e.data, name, e.ports);
  try {
    while ((await navigator.serviceWorker.ready).active) {
      (await navigator.serviceWorker.ready).active.postMessage(null);
      await new Promise((resolve) => setTimeout(resolve, 1000 * 15));
    }
    closeMessageChannel();
  } catch (err) {
    console.error(err);
    closeMessageChannel();
  }
};
jimmywarting commented 2 years ago

my hidden iframe (in the cases of when a webpage uses secure https) then the iframe is kept alive and pinging the service worker every now and then... but only if it dose not support transferable streams.

in that case a "real" transferable ReadableStream is sent to the service worker and responded with evt.respondWith(new Response(transfered_readableStream)), then we don't need any ping/keep-alive hack and the 5 Minutes lifetime dose not matter.

guest271314 commented 2 years ago

Then what issue are you having?

Is this

once that is done i terminate the 2nd window by closing/removing it and then the stream gets closed or something. i'm not sure if it's a bug or if it's something that isn't spec:ed.

not the case anymore?

jimmywarting commented 2 years ago

@guest271314 Can we please stop talking about the double transfer problem now? it feels a bit of topic

in StreamSaver case the "double transfer problem" is only a issue when the site is insecure and the browser don't support transferable streams... when the site is insecure it will open up a secure site in a popup for a brief second only to transfer a transferable stream in best case scenario, worst case it must transfer a MessagePort to the service worker. after that the popup is closed automatically. This is the only time when we can't use postMessage hack to keep the service worker alive for a longer duration.

MattiasBuelens reproduced the double transfer problem quite well before in a minimal working example here: https://github.com/whatwg/streams/issues/276#issuecomment-482801869

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Re-transferring streams</title>
</head>
<body>
<script id="worker-script">
    if (self.importScripts) {
        self.onmessage = (ev) => {
            const rs = ev.data;
            self.postMessage(rs, [rs]);
        };
    }
</script>
<script>
    var rs = new ReadableStream({
        start(c) {
            c.enqueue('a');
            c.enqueue('b');
            c.enqueue('c');
        }
    });
    var workerCode = document.getElementById('worker-script').innerHTML;
    var worker = new Worker(URL.createObjectURL(new Blob([workerCode], { type: 'application/javascript' })));
    worker.onmessage = async (ev) => {
        var transferred = ev.data;
        worker.terminate(); // comment this line to "fix" the pipe
        transferred.pipeTo(new WritableStream({
            write(x) {
                console.log(x);
            },
            close() {
                console.log('closed');
            }
        }));
    };
    worker.postMessage(rs, [rs]);
</script>
</body>
</html>

which leads to this chain:

original ReadableStream <--> MessageChannel <--> worker's ReadableStream <--> MessageChannel <--> re-transferred ReadableStream

I have gotten around this problem by sending the message channel using something like i demonstrated above with MessageChannel here: https://github.com/jimmywarting/StreamSaver.js/issues/13#issuecomment-1013677071 in order to get a direct pipeline from point A to B and getting a shorter chain

i go a great length to get this StreamSaver to work in both secure and insecure pages and not having to force the developer to have to install a service worker them self.

That is why StreamSaver uses such a ugly hack of getting things to work. I wanted things to be as easy as just installing a dependency on StreamSaver and expect it to work with a simple import from npm or a cdn


StreamSaver never installs a service worker on the developers domain and interferes with with existing service worker that they might have set up already. that's why i have a so called "man in the middle" iframe/page that makes it easy to use StreamSaver from insecure pages and sites like jsfiddle where you can't install a service worker or don't want to have to go to the trouble of having to install a service worker yourself.

guest271314 commented 2 years ago

but if the site is insecure then i can't install the service worker and communicate with it in any possible way

Well, yes, you can. The most secure route if for the user to run code on their own machine, which is possible with Native Messaging.

Can we please stop talking about the double transfer problem now? it feels a bit of topic

I'm just trying to provide a perspective from outside looking in. A ServiceWorker is not necessary at all to achieve downloading files locally.

Good luck.

jimmywarting commented 2 years ago

Native Messaging isn't really a good option as it requires installing a extension and using native application and don't work in all browsers

I'm just trying to provide a perspective from outside looking in. A ServiceWorker is not necessary at all to achieve downloading files locally.

Yup,

guest271314 commented 2 years ago

Native Messaging isn't really a good option as it requires installing a extension and using native application and don't work in all browsers

Native Messaging works in Firefox and Chrome.

The user installs the extension locally. On Chromium simply select "Developer mode" at chrome://extensions then click "Load unpacked". No external resources or requests or Chrome Web Store are required.

Then you can use the same loading of <iframe> approach with a local file listed in "web_accessible_resources" and postMessage() to the parent, without using ServiceWorker (or CDN) at all. If you do want to use ServiceWorker for some reason, e.g., for user to click on the UI and commence download, you can still do that.

That allows you to use ServiceWorker on any origin, securely, without necessarily being on an HTTPS page online.

If you are talking about insecure pages, and requesting CDN's, all of those concerns go away when you are only running code already on your own machine.

Alternatively, use the Native Messaging host to download directly to the users' local file system.

I already filed a PR for this capability. You appeared to think unpacking a local extension involves external code or requests, it does not, and is at least as secure as the MITM code and requesting CDN's you are already using.

jimmywarting commented 2 years ago

The user installs the extension locally. On Chromium simply select "Developer mode" at chrome://extensions then click "Load unpacked". No external resources or requests or Chrome Web Store are required.

No visitor on your web site is ever going to do this... and you shouldn't force the user do this either

guest271314 commented 2 years ago

I think you misunderstand what I am conveying. I am not talking about visiting websites, or StreamSaver used by websites. I am talking about the end user that wants to use your StreamSaver code on their own machine. Native Messaging is useful for me and other users to, for example, stream speech synthesis engine output from local speech synthesis engine as a ReadableStream piped to MediaStreamTrackGenerator, or stream live system audio output to the user on any origin, which Chrome prevents both of the former by not outputting audio of speechSynthesis.speak() on the tab, and not capturing monitor devices on *nix OS's.

From my perspetive StreamSaver concept is based on streaming download of files on any browser. So you can basically do the opposite of what I do here https://github.com/guest271314/captureSystemAudio/tree/master/native_messaging/capture_system_audio (using postMessage(transfer, [transfer]) for Firefox)

onload = () => {
  const { readable, writable } = new TransformStream();
  const writer = writable.getWriter();
  const id = 'capture_system_audio';
  let port = chrome.runtime.connectNative(id);
  let handleMessage = async (value) => {
    try {
      if (writable.locked) {
        await writer.ready;
        await writer.write(new Uint8Array(JSON.parse(value)));
      }
    } catch (e) {
      // handle cannot write to a closing stream 
      console.warn(e.message);
    }
  };
  port.onDisconnect.addListener(async (e) => {
    console.warn(e.message);
  });
  port.onMessage.addListener(handleMessage);
  onmessage = async (e) => {
    const { type, message } = e.data;
    if (type === 'start') {
      port.postMessage({
        message,
      });
      parent.postMessage(readable, name, [readable]);
    }
    if (type === 'stop') {
      try {
        await writer.close();
        await writer.closed;
        console.log('Writer closed.');
        writer.releaseLock();
        port.onMessage.removeListener(handleMessage);
        port.disconnect(id);
        port = null;
        parent.postMessage(0, name);
        onmessage = null;
        await chrome.storage.local.clear();
      } catch (err) {
        console.warn(err.message);
      }
    }
  };
  parent.postMessage(1, name);
};

Native Messaging allows you to deploy your concept locally, to your precise specification, without reliance on any extrnal code - for end users.

Perhaps I misunderstand and your code is written primarily for web site developers exlusively

StreamSaver.js is the solution to saving streams on the client-side. It is perfect for webapps

not individual users that want to run your gear locally on any site they happen to be visting.

Again, good luck.