Closed robyoder closed 5 years ago
Thank you @robyoder for the report.
This is actually expected behavior. The check_trailing_bits
permits to configure the behavior for the trailing bits of the last byte (for valid length inputs) and not the trailing bytes. There is currently no convenience configuration to accept inputs of invalid length. Maybe it would be useful to add one, please open an issue if you would benefit from such feature.
However there is a way to decode inputs of invalid lengths using the position
of the Length
error when calling decode
(see documentation). This has very small overhead and can be done with a convenience truncate_decode
function:
fn truncate_decode(encoding: &Encoding, input: &[u8]) -> Result<Vec<u8>, DecodeError> {
match encoding.decode(input) {
Err(DecodeError {
position,
kind: DecodeKind::Length,
}) => encoding.decode(&input[..position]),
output => output,
}
}
Sure, we can work around it and we will, but probably without running the decode function twice.
My question is this: what is the benefit of having check_trailing_bits
for only the cases it currently covers? If some algorithm is generating base32 (or some other base) with trailing bits that are not zeros, isn't it plausible that it would also be generating base32 with trailing characters?
One such algorithm would be a naive secret generator that selects characters at random from the base32 alphabet. The final character in the string may represent 1-4 trailing bits, or a full 5 extra bits. I'm curious what kinds of algorithms check_trailing_bits
is designed to accommodate if not one like this.
You're not running the decode function twice, you're just calling it twice on invalid input and one call returns immediately. The truncate_decode
has very low overhead as I said (a few nanoseconds) because the length check is done before everything else. The content of the input is not read at all. This is actually one of the reasons why checking length is a different check than checking trailing bits. One doesn't look at the data while the other one does.
The customization provided by the library is meant to configure the implementation, not to adapt to use-cases. This is because the number of possible use-cases is unbounded, while the number of decisions taken in the implementation is finite. So the right design is the one which is currently taken by the library, which is to provide fine granularity control over the implementation. This is the job of the user to define and use the specification adapted to their use-case.
The current configurations have been added to permit to replicate the behavior of other implementations (like the GNU base64
and base32
programs) for compatibility reasons. I wasn't aware that some implementations accept invalid length and some users rely on that. I'll add this as a configuration option (see #29).
The algorithm you describe doesn't need this configuration to be efficiently implemented, because encode_len
which returns the number of base32 characters you need to generate a given number of bytes doesn't return invalid lengths. Something like the following would work fine:
fn generate(base: &Encoding, len: usize) -> Vec<u8> {
let input = vec![b'F'; base.encode_len(len)]; // Randomize content.
base.decode(&input).unwrap()
}
When using
spec.check_trailing_bits = false;
with the base32 spec, I would expect that any string of base32 characters would be decoded successfully with any extra bits ignored. But this is not the case. For strings with a length congruent to 1, 3, or 6 (mod 8), aDecodeError
is thrown.The above code produces the following output:
Playground link
I understand that by the canonical definition in the RFC, the second input is invalid base32, but so is the third, and I would expect them to be treated the same when it comes to
check_trailing_bits
. So for this example, I would expect the second line in the output to be identical to the first.Is the current behavior in this circumstance intentional, or is it simply a matter of a length check being done first without concern for
check_trailing_bits
?For reference, Google Authenticator for Android follows my expectation in this regard, as does Authy for iOS (in my own testing). In the comment linked there you can see that a "string of sixteen 7s ("7...7") and seventeen 7s both decode to the same byte array", which is not the case here.