Incorrect Basic authentication string with Unicode special chars

n-peugnet commented 2 years ago

When the Basic authentication string contains Unicode special chars (for instance é) it is incorrectly base64 encoded. As the documentation of the used base64 library states:

base64.encode(input)

This function takes a byte string (the input parameter) and encodes it according to base64. The input data must be in the form of a string containing only characters in the range from U+0000 to U+00FF, each representing a binary byte with values 0x00 to 0xFF. The base64.encode() function is designed to be fully compatible with btoa() as described in the HTML Standard.

To base64-encode any Unicode string, encode it as UTF-8 first:
var base64 = require('base-64');
var utf8 = require('utf8');

var text = 'foo © bar 𝌆 baz';
var bytes = utf8.encode(text);
var encoded = base64.encode(bytes);
console.log(encoded);
// → 'Zm9vIMKpIGJhciDwnYyGIGJheg=='

So any Unicode character outside of the 0x00 to 0xFF range will be incorrectly encoded. I think it is safe to assume that most Web-servers and authentication back-ends use UTF-8 to decode the authentication string, so I would say that it should be modified in this library rather than in the clients. If you don't want to change it, this should be added in the docs that the strings must be UTF-8 encoded before passed to createClient().

I do think it's an easy fix on your side, as you just need to call utf8.encode in the toBase64() function (and utf8.decode in the corresponding fromBase64() function): https://github.com/perry-mitchell/webdav-client/blob/53d1a1cbac34b17d66996ff26f05868afd7e723f/source/tools/encode.ts#L23

It does require one more dependency and creates a breaking change though.

perry-mitchell commented 2 years ago

@n-peugnet Thanks a lot for this, and for the link from your implementation. I'd definitely like this added as I'm sure many have or will experience this. I'm not worried about an extra dependency in this case. Why do you feel it would be a breaking change?

n-peugnet commented 2 years ago

I assumed that calling utf8.encode two times in a row on the string would create an incorrect string too. And that it would break clients that fixed it on their side as I did. But it may be fine, I didn't make the test.

n-peugnet commented 2 years ago

After further research, UTF-8 encoding is not the defaut yet according to the RFC 7617 (Appendix B.3), but all major browsers seem to have switched to use it by default (Opera since at least 2009 Chrome since at least 2017 and Firefox since 2018. I could not find data about Safari using it by default.

MDN docs states :

Browsers use utf-8 encoding for usernames and passwords.

Firefox once used ISO-8859-1, but changed to utf-8 for parity with other browsers and to avoid potential problems as described in bug 1419658.

This is definitely a breaking change as UTF-8 is byte compatible with the previous default (ISO-8859-1) only for Unicode chars from 0 to 127. So changing it would break existing auth using chars from 128 to 255.

By the way, I made a mistake. The character that caused me trouble : é is in fact U+00E9 (Unicode number 233), so my initial thought was false, it is a char in the range 00-FF, and it was correctly base64 encoded based on the charset ISO-8859-1. But the problem was my backend (Apache + mod_authnz_ldap + OpenLdap) was expecting UTF-8.

n-peugnet commented 2 years ago

Also, if you want to make a clean change about making UTF-8 encoding the default for auth, then it would be a good idea to also change it in Digest auth. So only updating toBase64() will not be sufficient.

perry-mitchell commented 3 months ago

This is ancient but happy to see a PR on this, if someone has time. I'll most likely close it otherwise due to lack of interest/concern.

perry-mitchell / webdav-client

Incorrect Basic authentication string with Unicode special chars #293

`base64.encode(input)`