whatwg / streams

Streams Standard
https://streams.spec.whatwg.org/
Other
1.35k stars 161 forks source link

Add URL.createObjectURL(stream) support #480

Open alippai opened 8 years ago

alippai commented 8 years ago

If I'm right this only requires a wrapper which adds (mime)type information - like it's available for MediaStream, File and Blob.

domenic commented 8 years ago

No, this would be a large spec effort which we have no plans to pursue at this time. Explaining how every feature on the platform that takes a URL can be modified to work with a stream is a lot of work. What is the processing model for e.g. infinite streams? Streams of JavaScript objects? Streams with long delays? Empty streams? You can imagine ways to try to address this generically, but they are not high priority compared to the rest of the streams work.

alippai commented 8 years ago

The very same questions should be applied for SW worker fetch events too.

By spec it's already possible to construct a response with streams: event.respondWith(new Response(stream, { headers: {'Content-Type': 'text/html'} }));

fetch() streams in SW workers aren't the same?

domenic commented 8 years ago

The difference is there is no API img.response = response which would have to deal with those cases. There is, however, a img.src = url.

jimmywarting commented 8 years ago

I kind of did this in my StreamSaver

When a stream gets created it signals the service worker that it should intercept a link with a new response, that response gets constructed with a new ReadableStream and you get back a uniq link. Then the only thing left to do is piping the stream to the service worker and use that link from somewhere

I just now realized I could have done this lib a bit more generic and implemented something like URL.createObjectURL(stream) and create a link with download attribute instead of using content-disposition. Then you could even use it for other stuff like img/video/script/style tags as well

isonmad commented 8 years ago

The difference is there is no API img.response = response which would have to deal with those cases. There is, however, a img.src = url.

I don't see the distinction. A Response in a service worker effectively becomes the URL on the main thread. Can't createObjectURL just work on, and be limited to, byte streams? It would be a convenience function around the (still-unimplemented) transfer of the readablestream to the service worker on top of some kind of handshake where you set/receive the URL you can then retrieve the contents of said stream via. (Wait, you'd also need to find a way to keep the service worker instance 'alive' and containing the received readablestream up until it actually receives the request and can use it in the Response, so maybe those steps should be reversed.)

Although having a MIME type of anything but octet-stream would be difficult with passing just a stream by itself, like alippai said. Could createObjectURL(new Response()) get specced instead as the appropriate wrapper, instead of requiring a full service worker to accomplish the same result? Would that be an issue to file with the fetch api, or is some other repo appropriate?

mikeal commented 7 years ago

This is going to sound like hyperbole but it really isn't: this is probably the most important thing ReadableStream could do to advance the web platform.

Let me explain.

For quite a while now we've had alternative transports available in the browser (WebSockets & WebRTC data channel). With that we've built even higher order transports like WebTorrent to distribute content. I've also seen different tarball and other multi-file compression bundles used in these data channels as well as within the new fetch() API.

Today, the usefulness of these transports is hampered by the fact that they are locked out of a substantial portion of the web platform. The reason people are mostly sending application data rather than actual content is that you can't effectively use media elements with non-HTTP transports unless you can buffer all of the content up-front into a bundle that can be turned into a single Blob and then sent to createObjectURL().

I understand your perspective @domenic because you're thinking pretty directly in terms of the current implementations but at a purely technical level there is no difference between a ReadableStream and an HTTP network request in the terms you've outlined (can be zero, can be undetermined length, etc).

I can't tell you the backflips I've gone through in the last six months that would be unnecessary if this feature were added. I've added ServiceWorkers in order to build an HTTP interface for streaming data, I've built HTTP servers into Electron apps to do the same, I've created entire micro-service infrastructures to "fake" live files being pushed over WebSockets. And all of these end in user experiences that are much worse that what would be very simple were this feature available.

Today, ReadableStreams aren't all that useful because there's a limited number of APIs that expose and consume them. The list of non-ReadableStream "streaming" interfaces in the browser is absurd: MediaStream, File API, RTCDataChannel, WebSocket, whatever the hell the Speech API creates, etc. If we were able to create object URLs from a ReadableStream we could start to build compatibility layers from these non-standard interfaces that could be used in other parts of the web platform that just need a URL. If we want ReadableStream to become more widely used a good strategy would be to make it the obvious compatibility layer between all of these incompatible APIs and anything that takes a URL.

jimmywarting commented 7 years ago

I did like @isonmad idea about making a object url from Response instead. It's more useful then creating a object url from a ReadableStream - that way you can add headers also to define content-length content-type etc. You could create a content-disposition header to define a filename for download

that way img.src would work very well for example And you can get a download percentage if you know content-length

You would just have to wrap stream in a Response and you would be good to go

URL.createObjectURL(new Response(new ReadableStream({...}), opts))
mikeal commented 7 years ago

@jimmywarting @isonmad Using Response this way might open up a pandoras box of security issues.

The security model for Blobs is pretty much identical to what we want from ReadableStreams. Service Workers (where Responses are currently constructed by user JS code) side step a lot of security issues by limiting how the working can be installed and restricting the domain. The rest of the platform doesn't have that luxury.

Adding a mimeType property to ReadableStream or createObjectURL would probably be much easier.

domenic commented 7 years ago

Can you describe the exact threat model you're anticipating? A Response is just a small wrapper around a ReadableStream + Headers.

mikeal commented 7 years ago

@domenic aren't some of those headers XSS related? I'm assuming that in ServiceWorkers, because you are caching and reconstructing remote requests, you can create responses that appear to be from foreign domains. Maybe this is already solved and those are all blocked or ignored.

jimmywarting commented 7 years ago

What would be the difference of just doing URL.createObjectURL(response) when a Service worker can construct there own response with a ReadableStream and do just the same thing?

wanderview commented 7 years ago

The URL returned by createObjectURL is reusable, but a ReadableStream is consumable. This mismatch is my main objection to this. I don't think we should create more implicit buffering in the browser.

Also, we see repeated cases of sites failing to call revokeObjectURL. The API is just leak prone.

I think we would be better served making various .src attributes take Response or ReasableStream objects directly.

mikeal commented 7 years ago

What would be the difference of just doing URL.createObjectURL(response) when a Service worker can construct there own response with a ReadableStream and do just the same thing?

I've actually done this hack, it's not pleasant and isn't really what service workers were designed for. Also, you can't create libraries and an ecosystem on top of ServiceWorkers because they can't be loaded dynamically by any ol' code, they have to be installed in a special way which limits their utility. There are some good reasons for them to be done this way, I'm not trying to denigrate them, but we can't rely on them for something like this.

mikeal commented 7 years ago

I think we would be better served making various .src attributes take Response or ReasableStream objects directly.

How practical is this?

Would this require reaching into the spec for every element we want to do this to and getting an update?

I don't think that anyone here cares about specifically using createObjectURL. If a simpler path is a new method either on the URL object or a new method on Response or ReadableStream it doesn't make much of a difference.

I'll defer to people with more experience in the spec process to answer this, but which approach do people think is simplest/fastest to get us to the point where we can use ReadableStreams as sources to the same degree that we can currently use Blobs?

isonmad commented 7 years ago

The URL returned by createObjectURL is reusable, but a ReadableStream is consumable.

Right, so obviously for this case the spec would have to declare that, unlike blob: URIs created from Blobs and MediaStreams, URIs created from a Response object can only be fetched once, and when attempting to fetch it, it first checks if the stream is disturbed and gives a network error / the URI is automatically revoked.

That would actually be less leak prone, so benefits all around.

I think we would be better served making various .src attributes take Response or ReasableStream objects directly.

That might help the MediaStream use case, but it doesn't help e.g. downloading files that are generated/processed client-side that are too large to hold all in memory at once.

jakearchibald commented 7 years ago

Service Worker F2F: We much prefer the idea of giving a Response to the element, via a .srcObject property. What about a promise that resolves to a response? We think that sounds good too.

Action item: figure out what to do with .currentSrc etc.

Daniel-Abrecht commented 6 years ago

I just noticed this shortcoming today too. I think using URL.createObjectURL(response) would be really neat to have, everything can take urls, and the functionallity seams to already exists in browsers in service workers. It just doesn't make sense to force people to make such complicated workarounds for something that useful. I don't really care how it's done, just some way of doing it would be nice.

Alternatively or additionally, can we just drop .srcObject and allow such objects to be assigned to any .src and .href attribut, including Response objects?

neben commented 5 years ago

Is there any progress on this?

benwiley4000 commented 3 years ago

Personally I could benefit a lot from the ability to turn an arbitrary readable stream into a URL.

I have a download button for large files, which must wait for a multipart data fetch to complete (this is the only fetch method currently available to me), parse a File from the result, then use the invisible anchor tag hack to initiate a download. This means the user isn't able to do anything nice like watch the progress as the file downloads, or open the file in a new tab and wait for it to load. Just a long pause and then... all downloaded!

What would be much better is if I could take a ReadableStream of the main file content which itself would read the body stream of the Fetch API (I already have this for streaming text and video content onto the page). Then I could create a URL out of that stream and its content type, render an <a> tag, and the browser could either load it in a separate tab or initiate a client download. I don't know how niche my use case is, but to be able to solve it would be extremely cool.

ETA: I am looking more into using service workers for this, and it seems like it could maybe be a good solution.

jimmywarting commented 3 years ago

ETA: I am looking more into using service workers for this, and it seems like it could maybe be a good solution.

just a heads up; Service workers are sort lived and can interrupt the request if it takes too long 😞 .

@benwiley4000 have a look at this if you want to save large files with streams:

benwiley4000 commented 3 years ago

@jimmywarting how long before interruptions take place? Do the tools you shared support download via URL?

jimmywarting commented 3 years ago

how long before interruptions take place?

@benwiley4000 think it was something like 30s in FF and 5m in chrome can't remember; long time ago. might have changed also. The timer resets when an event comes in like a fetch, postMessage or a push event (https://github.com/w3c/ServiceWorker/issues/980) Somewhere it was said that it should be allowed to stay alive for the duration of a controlling window handling a request/response that responded with a custom response; closing the window would mean that the timer would start or interrupt the response

But if you are only responding with a Response without new Response(new ReadableStream(...)) and the service worker don't have to do any work (transformation like decryption or anything like that) than the service worker can be terminated safely

Me and others would love to have long lived service worker...

Do the tools you shared support download via URL?

not per say but you can do fetch(request).then(response => response.body.pipeTo(writableStream))

The StreamSaver and the native file system adapter lib keeps pinging the service worker with postMessage until the stream is closed (finished)

the native file system (only available in blink behind a flag) don't require service worker, you get a write handle to write chunks, blob, strings, arrayBuffer and arrayBufferView to.

benwiley4000 commented 3 years ago

@jimmywarting thanks for the info. BTW to whomever marked that as off-topic, I don't think it's off-topic since it's related to the need and potential workarounds for URL.createObjectURL(stream).