w3c / FileAPI

File API
https://w3c.github.io/FileAPI/
Other
104 stars 44 forks source link

Add option to Blob constructor to skip UTF-8 encoding #102

Closed zbjornson closed 6 years ago

zbjornson commented 6 years ago

Step 4.2.1 of the Blob constructor algorithm is to UTF-8 encode the source string. This creates problems if the source string is the set of literal bytes (e.g. is a "binary string"). (See https://stackoverflow.com/questions/23795034 for example.)

It would be nice to have a new option to the constructor, like literal (defaults to false), or utf8encode (defaults to true) to avoid having to manually convert the string to a BufferSource (Uint8Array).

inexorabletash commented 6 years ago

More precisely, the request is for an option that treats input strings as ByteString rather than USVString. Since strings are sequences of 16-bit code units, they need to be interpreted somehow to become bytes, whether it's interpreting as UTF-16 and transcoding to UTF-8 (as is done today), storing as UTF-16LE or UTF-16BE (plausible, but let's not), or truncating to 8-bit values (per the request). Presumably this would follow the behavior for ByteString and throw on code units > 0xFF.

Seems reasonable. But on the other hand, why are libraries putting binary data in strings in the first place? Are these mostly older libraries that predate ArrayBuffer? Old APIs like atob() ? Should we extend the web platform just to support old libraries, especially when this can easily be worked around in userspace? (new Uint8Array(string.split('').map(c=>c.charCodeAt(0))))

zbjornson commented 6 years ago

The case we hit was creating a File from a data: URL (from canvas.toDataURL). That data: URL came from a library that supports browsers that don't have canvas.toBlob, so yes, in this case it's a compat thing and possibly not worth the trouble of implementing.

(Are there other creators of data: URLs where better APIs don't exist?)

annevk commented 6 years ago

I don't think there are other creators, but if there are we should fix those to allow them to return Blob objects instead.

I'd prefer not fixing this here as this is really a problem with the input and not the API.