marshallpierce / rust-base64

base64, in rust
Apache License 2.0
615 stars 115 forks source link

InvalidByte on string containing unicodes #194

Closed soywod closed 2 years ago

soywod commented 2 years ago

Here the encoded string:

UmU6IFlvdXIgQXBw4oCMbOKAjGUgSeKAjEQgaGFzIGJlZW4gbG9ja2XigIzigIxk4oCM4oCMIG9uIFRodXJzZGF5LCBNYXksIDE5IDIwMjIgW3JlZjpfMjU4NzQ2XQ====

When I try to decode it with this function:

fn decode_base64(encoded_bytes: Vec<u8>) -> Result<Vec<u8>> {
    let config = Config::new(CharacterSet::Standard, true).decode_allow_trailing_bits(true);
    let decoded_bytes = base64::decode_config(&encoded_bytes, config)?;
    Ok(decoded_bytes)
}

I get the error Err(InvalidByte(126, 61)) and I cannot determine why.

Online tools like https://www.base64decode.org/ seems to be able to decode the string:

Re: Your App‌l‌e I‌D has been locke‌‌d‌‌ on Thursday, May, 19 2022 [ref:_258746]

And if I remove manually the 2 last == from the string I can also decode it with your lib. Any idea of what is going on with this string?

PS: I noticed that the string is strangely built, it looks like it contains unicodes (could be the cause?). Rust prints it this way:

"Re: Your App\u{200c}l\u{200c}e I\u{200c}D has been locke\u{200c}\u{200c}d\u{200c}\u{200c} on Thursday, May, 19 2022 [ref:_258746]"
marshallpierce commented 2 years ago

Base64 standard alphabet is only ASCII a-zA-Z0-9/+ with = for padding. Any other bytes are invalid. That web site is probably just ignoring errors rather than reporting them. \u{200c} is the "zero width joiner" character. That's not valid base64, and they must be removed if you want to base64 decode. Also, it's invalid to have 4 padding characters (only = or== are valid). Whatever is generating that base64 is doing some pretty weird stuff...

soywod commented 2 years ago

Thank you for your reply, I understand better. I use your lib for a RFC2047 decoder (used by an email client), and one user reported me this parsing error (it comes from an email subject). Email domain is a wild jungle…