Decoding ASN.1 signed integers always produces unsigned result

lilyball commented 3 years ago

BerObject doesn't seem to have any way of actually parsing a negative ASN.1 integer as a negative number. It only exposes .as_u32() and .as_u64(), both of which simply assume the whole thing is unsigned (even though ASN.1 integers are signed and BER/DER-encoding represents them as two's complement integers). Because of the variable length of an integer I can't even cast the result to a signed value, as e.g. -123 encodes as [0x85] which is then just 133 (and therefore casting to a signed integer preserves the 133 representation).

The only nod towards signed integers is the existence of .as_bigint() in addition to .as_biguint(). Unfortunately (and completely undocumented), the result is always a positive BigInt, as the implementation explicitly provides the positive sign instead of using the constructor that detects the sign from the bytes.

Example:

DerObject::from_int_slice(&BigInt::from(42).to_signed_bytes_be()).as_bigint().unwrap() // returns 42
DerObject::from_int_slice(&BigInt::from(-123).to_signed_bytes_be()).as_bigint().unwrap() // returns 133

chifflier commented 3 years ago

You're right, signed integers cannot be decoded for the moment. I'm planning to add this for the next release.

lilyball commented 3 years ago

An ASN.1 integer is defined as a signed integer so I’m not sure why this library even allows for parsing it as an unsigned one (and defaults to that everywhere, such as when reading a certificate’s serial)

chifflier commented 3 years ago

I encountered this while working on an alternative crate for ASN.1 parsing (asn1-rs, which may be interesting for you). Note: I intend to maintain both crates. Signed integers parsing is done in https://github.com/rusticata/asn1-rs/blob/master/src/asn1_types/integer.rs, and I think I'll copy the relevant parts in this crate.

The "classical" method is to encode it as the memory representation of a signed integer. This is not part of the specifications. Negative numbers will start by 0xff, which means an extra 0x00 has to be inserted before (MSB cannot be 1 in the first content byte). For example, -1 can be encoded as 02 02 00 ff.

lilyball commented 3 years ago

The "classical" method is to encode it as the memory representation of a signed integer. This is not part of the specifications.

I'm not sure what you mean by this.

X.680 (ASN.1 notation) defines IntegerType such that its value is either a SignedNumber or an identifier from the corresponding IntegerType definition's named number lists (which are also signed numbers). Or in short, ASN.1 integers are signed. The older X.208 agrees.

X.690 (ASN.1 BER and DER encoding) says that the encoding of an integer value is primitive and the contents octets are the 2's complement encoding of the integer value.

Taken together, ASN.1 integers are always signed, and BER/DER encoding always encodes them as 2's complement. Every value tagged as an integer must be decoded as a signed integer, otherwise the decoding is wrong. The only reason to ever support decoding an unsigned integer is for use with application/private tags. Which is to say, the primitive for "turn this byte slice into an unsigned bignum" should exist, but it should not be used for decoding universal tags, only as a tool for use by implementations of custom tags. And it should also not even be exposed on an integer type as .as_biguint() as this can only serve to reinterpret a signed integer as an unsigned one, thus producing incorrect results.

Also to note, from that asn1-rs/integer.rs link, the conversion to BigInt is already wrong in exactly the same way I was complaining about in this ticket:

    pub fn as_bigint(&self) -> BigInt {
        BigInt::from_bytes_be(Sign::Plus, &self.data)
    }

This is producing a BigInt containing a positive number representation of the integer data interpreted as unsigned. Which is to say, it will produce 255 for 02 01 ff.

Negative numbers will start by 0xff, which means an extra 0x00 has to be inserted before (MSB cannot be 1 in the first content byte).

That's backwards. 02 02 00 ff encodes the positive integer 255. This is because without the leading 00, the ff is then interpreted as negative. So -1 is encoded as 02 01 ff. This case already exists in the asn1-rs/integer.rs you linked, I255-BYTES

chifflier commented 3 years ago

@lilyball You're right, I was completely confused. I rewrote integer parsing in e84d3a9852727735c60eff51a51a104a193c6b94, and also fixed bigint/biguint parsing in 74950d2bd76e18eb20f396267ac561bfd4ccde24

Examples of test vectors can be found in https://github.com/rusticata/der-parser/blob/74950d2bd76e18eb20f396267ac561bfd4ccde24/tests/ber_parser.rs#L142-L244 (for BER, and in der_parser.rs for DER), and I believe they are now correct.

Can you confirm this? If this is fixed, I'll also backport for fix for branch 5.x

lilyball commented 3 years ago

The as_u* methods feel like a footgun. The only reason I can think of for them is specifically for as_u64(), just to cover the case of decoding a positive number that doesn’t fit in i64 but does in u64. I’d recommend removing the others and documenting specifically that this exists for this one case, though it annoys me that this means BerError::IntegerNegative will still need to exist. Perhaps there’s another error that can be returned instead from this one particular method? Or maybe it’s better to just delete as_u64() as well as it feels like a bit of an edge case to need to support numbers that fit in u64 but not i64 without using bignums. And people can always decode the byte slice themselves if they need that. This will cut down the amount of code you need, remove the decode_array_u* methods, and remove BerError::IntegerNegative.

I’m also confused as to why the as_i* methods check for the high bit. Can’t you just zero-pad to the necessary length and use a single code path?

Same question with the bigint decoding. Why does it check high bit and call one of two different BigInt constructors? I’m also not convinced as_biguint() needs to exist, it doesn’t simplify decoding and anyone who wants a BigUint can call as_bigint().to_biguint(). The only reason I can think of to convert directly is if you wanted to implement the ToBigInt/ToBigUint traits but you aren’t doing that.

chifflier commented 3 years ago

Thanks for your reply, your suggestions are good. I'll try to explain my reasoning behind the current choices:

The as_u* methods feel like a footgun. The only reason I can think of for them is specifically for as_u64(), just to cover the case of decoding a positive number that doesn’t fit in i64 but does in u64.

These methods are especially useful to describe the value you expect, not any generic value that could be. And indeed, it would put the burden of handling values close to the limit on the caller (with risks of errors).

This follows the general philosophy of this crate: provide both generic parsing methods (mostly useful when you don't know what the type or value are) and specific methods (great for writing a parser from the ASN.1 grammar: you only have to provide the specific combinators you expect).

IMHO, calling as_u64 is a convenient way to tell "I want an integer, checking that is is 1) positive 2) not greater than 2^64".

I’d recommend removing the others and documenting specifically that this exists for this one case, though it annoys me that this means BerError::IntegerNegative will still need to exist. Perhaps there’s another error that can be returned instead from this one particular method? Or maybe it’s better to just delete as_u64() as well as it feels like a bit of an edge case to need to support numbers that fit in u64 but not i64 without using bignums. And people can always decode the byte slice themselves if they need that. This will cut down the amount of code you need, remove the decode_array_u* methods, and remove BerError::IntegerNegative.

I'd prefer to provide multiple methods (documenting their limitations), but leave the choice to the user. From my experience, this is required to cover many use cases:

providing access to the raw bytes is needed for example in X.509, because the serial number is encoded as a number, but cares for the exact number of bytes (that can be leading zeroes, and would be removed). That's why the slice is stored in the object.
the most generic method to get a number is as_bigint(), but it requires a dependency on num-bigint and can be much slower than other methods. der-parser no supports no_std, and aims at being usable also for low-resources platforms.
I can of course use only one code path, padding to the largest supported type (i64? i128?). However, there are platforms where these types are not natively supported, and the compiler has to insert tons of code to handle them. IMHO, this is a small compromise between some code duplication and remaining efficient (without losing safety).

Having multiple parsing methods is useful everywhere in crates where you know the ASN.1 grammar: it is way easier to write parse_ber_u32() when reading a sequence item, rather than parse_ber_integer() followed by as_bigint(), then to_u64(), and adding error checking code everywhere in the calling code.

I’m also confused as to why the as_i* methods check for the high bit. Can’t you just zero-pad to the necessary length and use a single code path?

No, because negative values are padded with 0xff (2-complement of 0). Also, positive values must be checked to remove leading zeroes that could make the bytes longer than the buffer, but would not change the value.

Same question with the bigint decoding. Why does it check high bit and call one of two different BigInt constructors? I’m also not convinced as_biguint() needs to exist, it doesn’t simplify decoding and anyone who wants a BigUint can call as_bigint().to_biguint(). The only reason I can think of to convert directly is if you wanted to implement the ToBigInt/ToBigUint traits but you aren’t doing that.

The two constructors are here because originally, I was using BigInt::from_bytes_be, but for some reason using Sign::Minus did not work. This may not be correct, I'll check if from_signed_bytes_be works for both sign/unsigned cases.

There is no strong reason for providing as_biguint(). The only think I'm disliking about chaining as_bigint().to_biguint() is that this may fail for various reasons (for ex. wrong tag) for the former, and to_biguint() returns an Option type, so error handling is not really consistent.

chifflier commented 3 years ago

5.1.1 has been published and fixes the signed/unsigned problem, so I'm closing this.

Feel free to reopen or open another issue if you would like to discuss further. Thanks

rusticata / der-parser

Decoding ASN.1 signed integers always produces unsigned result #49