marshallpierce / rust-base64

base64, in rust
Apache License 2.0
606 stars 113 forks source link

Test vectors against malleable base64 encodings #203

Closed kchalkias closed 1 year ago

kchalkias commented 1 year ago

Awesome work re recent support for canonical decoding. We should also have a test using the test-vectors provided in the paper in table 2: https://eprint.iacr.org/2022/361.pdf

base64_test_vectors
ia0 commented 1 year ago

Interesting that the paper doesn't mention the data-encoding crate (as old as base64 and stable since 5 years) which provides a Encoding::is_canonical function specifically for this purpose. The data-encoding.rs playground also provides this information when designing for an encoding. The documentation also explains the concept of canonical encoding in the basics of specifying an encoding to warn users about it. The documentation of the library also has a table when describing the properties of the crate which is similar to the one of the paper but more complete. (Note that the base64 crate comparison refers to the state before the recent PRs. I've updated the table in the repository but didn't publish a patch version since it's only documentation.)

Also the paper doesn't seem to mention why padding is useful, and thus when it makes sense to use an encoding with padding. Most of the time (actually almost always), padding is useless and should not be used.

kchalkias commented 1 year ago

@ia0 indeed data-encoding was not included in the paper, I guess the main objective back then was to compare against the default behaviour between multiple languages, Rust just being one of them with 1 or 2 popular candidate libs per lang.

Page 5 in the paper explains applications where this is important. We even managed to break into one of the most popular ticketing services and purchase tickets for free, DoS attack systems resulting to memory leaks, and even break idempotence in protocols that expect deterministic payloads (ie signatures over canonical JSON), bypassing log detection for conceptually duplicated payloads.

It all starts from the unfortunate misconception where the vast majority of devs still believe base64 is always canonical (agreed that the issue is clearly mentioned in some docs, but it seems many miss this detail in practice).

ia0 commented 1 year ago

Rust just being one of them with 1 or 2 popular candidate libs per lang

I see. I would still argue that if popularity was measured by all-time downloads then the order is:

But I understand that crates.io is not simple to search, which I find a bit problematic.

Page 5 in the paper explains applications where this is important

Sorry, I didn't mean "when canonical encoding is useful" but "why padding is useful". The exploits described in the paper are impressive. I've actually updated the documentation of data-encoding to mention that non-canonical encoding may open attack vectors by pointing to the paper.

My understanding of padding is here. It's only useful when you need to send a continuous unbuffered stream. I reversed this from the GNU base64 behavior. And I find it quite logic. So this means devs should almost never use padding. I think the vast majority of devs not only don't understand what a canonical encoding is but they also don't understand when to use padding. This is sad that most implementations provide padding by default. It's almost like a gimmick.