whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.17k stars 2.69k forks source link

base64url variant of btoa/atob #351

Open beverloo opened 9 years ago

beverloo commented 9 years ago

The HTML specification has two methods for converting between a unicode string to a base64-encoded representation of it, and vice versa.

https://html.spec.whatwg.org/multipage/webappapis.html#dom-windowbase64-btoa

The URL-safe base64 encoding (base64url), also defined in RFC4648, has been adopted by a few specifications recently. Examples include the Push API (PushSubscription serialization) and various parameters of JWK objects (EME, Web Crypto).

https://tools.ietf.org/html/rfc4648#section-5

While the contents aren't immediately intended for consumption by the web app, those which would like to now need their own conversion methods. (As trivial as that may be.)

The naming of btoa/atob doesn't make it very extensible. We could either add an argument (optional boolean urlsafe = false), or introduce methods analogous to them for the different encoding - urlbtoa/urlatob? I prefer the former.

I'd be happy to generate a pull request if you think adding these makes sense.

mathiasbynens commented 8 years ago

Minor correction:

methods for converting between a unicode string to a base64-encoded representation of it

btoa and atob act on “bit strings”. To convert a Unicode string to base64, encode it first using an encoding of your choice, e.g. UTF-8:

const unicodeString = 'foo𝌆bar';
const textEncoder = new TextEncoder('utf-8');
const bytes = textEncoder.encode(unicodeString);
const bitString = String.fromCodePoint(...bytes);
// Well, that was awkward. But now we can finally base64-encode!
const encoded = btoa(bitString);
// → 'Zm9v8J2MhmJhcg=='

If btoa/atob were to be designed today, they’d probably accept/produce Uint8Arrays of bytes (which is what TextEncoder outputs). The new methods should probably do this, unless consistency with btoa/atob is more important.

domenic commented 8 years ago

This seems like a proposal that's more appropriate for the Encoding Standard, in any case? Unless I am misunderstanding.

mathiasbynens commented 8 years ago

@domenic I’m not sure. There are two layers of encoding here:

  1. Text encoding (text → bytes), as provided by the Encoding Standard;
  2. Base64 encoding (bytes → base64-encoded “text”), which operates on bytes, not text.
domenic commented 8 years ago

@mathiasbynens yeah, that's true. I guess I was reacting to how the Encoding Standard properly separates out byte inputs/outputs from string inputs/outputs. But I see that it doesn't have any methods that are bytes -> bytes, like this would be.

annevk commented 8 years ago

A base64 encoder is similar to a text decoder, seems like. Should we just introduce a Base64Encoder/Base64Decoder pair that has a similar design to the classes from the Encoding Standard?