nolanlawson / blob-util

Cross-browser utils for working with binary Blobs
https://nolanlawson.github.io/blob-util
Apache License 2.0
503 stars 45 forks source link

Clean up media type in blobToDataUrl #61

Open Treora opened 4 years ago

Treora commented 4 years ago

Currently, blobToDataUrl simply puts blob.type into the URL (at this line), but it should be cleaned up to be valid:

I may tackle this myself sometime. For now I thought to just raise the issue already and park my research notes here. :)

nolanlawson commented 4 years ago

Would it be enough to URL-encode the type? I.e.

'data:' + encodeURIComponent(blob.type) + ';base64,' + base64String

?

It seems to me that if a Blob (in this case a File) has whitespace or funky characters in its type, then that's its type. Presumably the browser should treat that File exactly the same as if it were encoded into a data URI – if the browser massages the string or corrects for user errors, then great.

I don't believe it's the responsibility of blob-util to clean up all the ways that mime types can be messed up when coming from any server, but I do believe that blobToDataUrl should create a URL that is roughly equivalent to what you'd get from URL.createObjectURL() (i.e. a blob URL). So it seems to me that encoding the string should be sufficient.

nolanlawson commented 4 years ago

OTOH if the blob.url already contains ;base64 in it, then I'm not sure there's much we can do except implement an entire mime parser, which I really don't think should be the responsibility of blob-util. :slightly_smiling_face:

Treora commented 4 years ago

Would it be enough to URL-encode the type?

Unfortunately, no. That would also percent-encode characters of the media type that we should keep intact, such as the slash and semicolons. According to the data URL spec quoted above, and especially when following the reasoning of this recently suggested erratum, each parameter value that is a quoted string should be stripped of its quotes and then percent-encoded where needed.

This would indeed require some parsing (especially as a quoted value can again contain a quote if preceded by a backslash, according to the quoted-string syntax used in the media type spec (rfc 7231). But since it seems quotes can only appear either around or inside parameter values (but backslash-escaped in the latter case), I wonder if some regex might actually suffice. A quick attempt:

function reencodeMediatype(type) {
    return type.replace(
        /"((\\"|[^"])+)"/g, // find any string within quotes (possibly containing backslash-escaped quotes)
        (_, m) => encodeURIComponent(
            m.replace(/\\(.?)/g, '$1')
        ), // unescape all backslash-escaped characters; then percent-encode the value instead.
    );
}

Quite possibly I will have overlooked some complication; regexes do not usually suffice for correctly parsing things. In any case, I agree this should not be a responsability of blob-util. I hope somebody already packaged and published a correct implementation somewhere that could just be used here, or perhaps it’s time to make one. On NPM I can only find the modules attempting to do the reverse (e.g. parse-data-url, data-urls); not sure where else to search.


OTOH if the blob.url already contains ;base64 in it, …

(presuming you meant blob.type)

…then, firstly, it would not be a valid media type parameter anyway as it lacks a =; as the spec says:

The ";base64" extension is distinguishable from a content-type parameter by the fact that it doesn't have a following "=" sign.

…and secondly, the current implementation would simply append a second ;base64, to the end, which presumably a receiving end would strip off to end up with the original (although invalid) type again. (a quick look at this library’s dataUrlToBlob() implementation reveals however that it throws away all of the media type parameters! And blindly assumes base64 encoding. Separate issue.)


Although making modules implement the details correctly seems worthwhile to avoid incompatibilities further down the line, I guess this might (at least for myself) not have the highest priority to dive even deeper into. Personally, I would probably just consider quoted media type parameters as unsupported for the time being, and fix the whitespace issue by just removing all whitespace from the type: blob.type.replace(/\s/g,'')

nolanlawson commented 4 years ago

This sounds like a tricky issue. I guess since blob-util is modular, it may be worth fixing. But your workaround also works fine and it seems like a bit of a edge-case, so maybe that's okay. I'm a bit on-the-fence about this.

I'm also nervous about the proposed regex fix, unless it can really handle all possible edge cases (with tests to demonstrate :slightly_smiling_face:).