Open martinvuyk opened 2 months ago
Note that I did not report it as a bug because the documentation does not seem to imply that white space is handled.
A relevant specification is WHATWG Forgiving Base64 decoding:
https://infra.spec.whatwg.org/#forgiving-base64-decode
C#/.NET follows it, as well as the JavaScript's atob
function. Possibly other systems follow it as well.
Forgot to add the specs and what Python does which is what we try to follow
RFC 4648 is what python follows. Section 3.3
Implementations MUST reject the encoded data if it contains
characters outside the base alphabet when interpreting base-encoded
data, unless the specification referring to this document explicitly
states otherwise. Such specifications may instead state, as MIME
does, that characters outside the base encoding alphabet should
simply be ignored when interpreting data
Python:
from base64 import b64decode
print(b64decode("Qm9 uam91cg=="))
output:
b'Bonjour'
in the Python docs:
If validate is False (the default), characters that are neither in the normal base-64 alphabet nor
the alternative alphabet are discarded prior to the padding check. If validate is True, these
non-alphabet characters in the input result in a [binascii.Error](https://docs.python.org/3/library/binascii.html#binascii.Error).
Bug description
Detected by @lemire in PR #3443
output:
Steps to reproduce
System information