google / jsonnet

Jsonnet - The data templating language
http://jsonnet.org
Apache License 2.0
6.95k stars 438 forks source link

Forbid base64(string) in favor of base64(array) #815

Open sbarzowski opened 4 years ago

sbarzowski commented 4 years ago

Applying base64 on string uses a weird encoding. I think we should fail with an error explaining what to use instead (and how to get the old behavior by naively encoding the string).

Right now it's a major footgun, so I think breaking backwards compatibility is worth it here (especially since the fix is easy on the user side).

sparkprime commented 4 years ago

To be clear, the current behavior encodes it as Latin1, is that right? And if it's not within Latin1 the unicode will generate higher codepoints and it will error. Whereas a more sensible default is to encode it with UTF-8, which would yield the same if it's ASCII but different within Latin 1 (and actually produce something useful outside of Latin1)

sbarzowski commented 4 years ago

Currently base64 takes the unicode codepoints. If it's greater than 255 it errors out. It seems like Unicode codepoints <=255 are the same as the original latin-1 (aka ISO 8859-1), yes. I didn't notice that before. So at least it's a fairly well-known encoding.

Yes, a more sensible default is to encode it with UTF-8. Our strings contain Unicode after all. We don't really have that option – that could silently break users. My proposition is to always require an explicit encoding (e.g. using std.encodeUTF8).