Open sbarzowski opened 4 years ago
To be clear, the current behavior encodes it as Latin1, is that right? And if it's not within Latin1 the unicode will generate higher codepoints and it will error. Whereas a more sensible default is to encode it with UTF-8, which would yield the same if it's ASCII but different within Latin 1 (and actually produce something useful outside of Latin1)
Currently base64 takes the unicode codepoints. If it's greater than 255 it errors out. It seems like Unicode codepoints <=255 are the same as the original latin-1 (aka ISO 8859-1), yes. I didn't notice that before. So at least it's a fairly well-known encoding.
Yes, a more sensible default is to encode it with UTF-8. Our strings contain Unicode after all. We don't really have that option – that could silently break users. My proposition is to always require an explicit encoding (e.g. using std.encodeUTF8).
Applying base64 on string uses a weird encoding. I think we should fail with an error explaining what to use instead (and how to get the old behavior by naively encoding the string).
Right now it's a major footgun, so I think breaking backwards compatibility is worth it here (especially since the fix is easy on the user side).