w3c / FileAPI

File API
https://w3c.github.io/FileAPI/
Other
104 stars 44 forks source link

Reslove Blob to an existing ArrayBuffer #143

Closed Zhang-Junzhi closed 4 years ago

Zhang-Junzhi commented 4 years ago

Currently, Blobs can only be resloved with a newly-created ArrayBuffer.

Sometimes, it would be much more efficient if a Blob can be directly resolved to an existing ArrayBuffer(If the size of ArrayBuffer is sufficient). For example, a content of a very large file can be directly read to the ArrayBuffer of a WASM memory. Without this feature, we need to first call File.ArrayBuffer to resolve the large content to a newly-created ArrayBuffer, and then copy it to the WASM memory.

annevk commented 4 years ago

Where is reslove defined?

What would the exact semantics of this be?

Zhang-Junzhi commented 4 years ago

Sorry, to clarify the definition:

Reslove here means returning a Promise that resolves with the contents of the blob as binary data contained in the ArrayBuffer, like Blob.ArrayBuffer, FileReader.readAsArrayBuffer

annevk commented 4 years ago

Would there be a single write for all bytes?

Zhang-Junzhi commented 4 years ago

AFAIK, currently there's no such way of resolving Blob to an existing ArrayBuffer(Or I am happy to be wrong).

If I use WebAssembly, and I want to read the content of a file, I need to first call File.ArrayBuffer to resolve the large content to a newly-created ArrayBuffer, and then copy it to the WASM memory. This means double writes.

annevk commented 4 years ago

I understand, I was basically asking if the writing would be done similarly to https://encoding.spec.whatwg.org/#dom-textencoder-encodeinto. So unless you're multi-threaded and use SharedArrayBuffer, you cannot observe a partially filled buffer.

Also, if we're doing this we should probably do it by changing the Body mixin in Fetch.

Zhang-Junzhi commented 4 years ago

I understand, I was basically asking if the writing would be done similarly to https://encoding.spec.whatwg.org/#dom-textencoder-encodeinto. So unless you're multi-threaded and use SharedArrayBuffer, you cannot observe a partially filled buffer.

Also, if we're doing this we should probably do it by changing the Body mixin in Fetch.

I am not sure which specific way to resolve the issue, but just raised the efficiency issue in my use case.

Zhang-Junzhi commented 4 years ago

I understand, I was basically asking if the writing would be done similarly to https://encoding.spec.whatwg.org/#dom-textencoder-encodeinto. So unless you're multi-threaded and use SharedArrayBuffer, you cannot observe a partially filled buffer.

Did you mean that I can use SharedArrayBuffer together with encodeInto via multi-threads, so WASM can work early while still coping buffer to WASM memory, as a workaround to reduce latency?

But consider if I am reading a 100MB file which is not of streaming format, that means unless the whole 100MB content of the file has been ready in WSAM memory, partically ready content doesn't have much value. In that case, SharedArrayBuffer + encodeInto still doesn't help much.

mkruisselbrink commented 4 years ago

If I understand what is being requested correctly, I believe you can mostly do that already. I.e. to read a Blob into a script-supplied (pre-allocated) ArrayBuffer, you can call Blob.stream() to get a ReadableStream, and then use a ReadableStreamBYOBReader to read from that stream into a script-supplied array buffer.

Zhang-Junzhi commented 4 years ago

Good method! Thanks for your reply. That is a specific guide to achieve my purpose(Though it seems none of the major browsers has implemented ReadableStreamBYOBReader yet.

Zhang-Junzhi commented 4 years ago

One more issue to put this topic further:

If WASM runs in a worker thread, since File blobs can only be used in the window thread. two copies still seems unavoidable, because WASM memory buffer cannot be detached and transferred to the window thread, and File objects in the window thread cannot be used in the worker thread.

annevk commented 4 years ago

You can message a File object to a worker. If that's not sufficient for some reason it might be best to open a separate issue for that as it seems different from reading a blob into an existing buffer.

Zhang-Junzhi commented 4 years ago

You can message a File object to a worker.

Oops, didn't know of that. Thanks for pointing out.

After checking the definition of File, and structuredserializeinternal in the HTML spec, I realised I had misunderstood File.

If that's not sufficient for some reason it might be best to open a separate issue for that as it seems different from reading a blob into an existing buffer.

Since it's still connected to the topic "read Blob into an allocated ArrayBuffer", just a special case of it. So I decided to post it in the same issue.

jimmywarting commented 4 years ago

fyi a ReadableStream is also transferable with postMessages

try {
  const readable = new ReadableStream()
  const mc = new MessageChannel()
  mc.port1.postMessage(readable, [readable])
  // tested support for transferable readable streams
} catch(err) {}

but it only works in chrome right now i think...

annevk commented 4 years ago

Let's dupe this into #83.

surferjeff commented 2 years ago

You can message a File object to a worker.

Oops, didn't know of that. Thanks for pointing out.

After checking the definition of File, and structuredserializeinternal in the HTML spec, I realised I had misunderstood File.

If that's not sufficient for some reason it might be best to open a separate issue for that as it seems different from reading a blob into an existing buffer.

Since it's still connected to the topic "read Blob into an allocated ArrayBuffer", just a special case of it. So I decided to post it in the same issue.

Would someone be willing to confirm that the following line doesn't copy the underlying file contents?

    ctx.postMessage(file);

If I understand @Zhang-Junzhi's comments correctly, then they conclude that a file's contents isn't copied into another buffer as it is passed via postMessage().

I read the two documents linked to above and don't understand them well enough to come to the same conclusion.

Context: People like to drop huge files (~1GB) into my web application, and I need to post them to workers. Copying buffers exceeds Chrome's memory limits:

DataCloneError
Failed to execute 'postMessage' on 'Worker': Data cannot be cloned, out of memory.
annevk commented 2 years ago

@surferjeff currently it does say that it makes a full copy, but I don't think that's correct. Could you file a new issue on that?