Rework Integer type for speed

Nicceboy commented 1 month ago

Hi,

I have been working on with improving Integer type to get better performance. Partial solution for https://github.com/librasn/rasn/issues/76

The current idea is make Integer type as enum as follows, after trying out many different approaches:

#[cfg(not(feature = "i128"))]
pub type PrimitiveInteger = i64;

#[cfg(all(target_has_atomic = "128", feature = "i128"))]
pub type PrimitiveInteger = i128;

pub const PRIMITIVE_BYTE_WIDTH: usize = core::mem::size_of::<PrimitiveInteger>()

#[derive(Debug, Clone, Ord, Hash, Eq, PartialEq, PartialOrd)]
#[non_exhaustive]
pub enum Integer {
    Primitive(PrimitiveInteger),
    Big(BigInt),
}

It uses the primitive type by default internally whenever possible to get the best performance, excluding cases where Big integer is used manually. Also, underflows or overflows are automatically converted to Big integer. BigInt could be easily changed to another type as well, if we would like to.

I am not sure if I over-engineered this one, but I wanted to make it easier to drop the mandated i128 type requirement in the future if decided, so that rasn could be used on targets which do not support that type, as it is #![no_std] crate.

Now, all the internal types can be changed just by changing PrimitiveInteger constant.

Removed most hard-coded casts from codecs to use PrimitiveInteger instead of i128 when working with constraints. Needs more work to remove the type completely.
Constraint types are tied to PrimitiveInteger, so i128 is dropped from there.
- Note for someone debugging constraints: constraint range is calculated silently and might wrap or saturated (e.g. range of i64:MIN and i64:MAX is 2x i64:MAX for UPER, and with i64 type this does not fit and will show zero range. Not sure how unexpected behavior this is, but it is more likely than with i128, if someone does not use i128.

Overall, BER and OER will get around 2x performance boost for Integers. UPER will get less, as the are many other allocations in the code, which contribute more for the current performance. OER can also get some significant boosts when allocations are optimized further.

On M2 Pro, overall improvement based on benchmark mentioned in issue #244, when i128 default used:

With i64 encoding is slightly faster, but decoding slower.

I tried different styles, but enum one seem to get the best from both and generics are not exposed to any API.

The downside for this approach is, that when decoding data, it is not always certain what is the inner Integer type (Primitive or Big), if type is not constrained.

I implemented Add Sub and Mul operations for Integer, because it was easier for some codecs. Maybe it is better to not increase the amount of them, and instead encourage to use internal type ops instead. That would also add maintenance burden.

Current code will need next release of Rust (1.79, will be released on 13. June) to work. Otherwise, either one line of unsafe code or more complex code is required to bypass borrow checker.

Also, it seems that it is not easy to automatically test the i64 type by removing i128 feature, because rasn-pkix is dev-dependency on the workspace, and it makes rasn tests compile with default features, no matter what (unless removed, or features changed for that crate). Maybe we don't need that bench with pkix?

Maybe @6d7a also wants to take look since it is kinda big change.

XAMPPRocky commented 1 month ago

Thank you for your PR! I don't know if you saw this discussion the other week, but I've an alternative proposal for improving integer decode speeds.

https://github.com/librasn/rasn/discussions/107#discussioncomment-9522935

I think this will provide better performance for most cases, as it will allow for the use of smaller container types without the need for branching since we know what integer type the user wants and can error out if bytes exceed the integer size.

I think we can use something like this though for codec internal numerics like length and similar.

Nicceboy commented 1 month ago

Thank you for your PR! I don't know if you saw this discussion the other week, but I've an alternative proposal for improving integer decode speeds.

#107 (reply in thread)

I think this will provide better performance for most cases, as it will allow for the use of smaller container types without the need for branching since we know what integer type the user wants and can error out if bytes exceed the integer size.

I think we can use something like this though for codec internal numerics like length and similar.

Yeah, that will reduce branching. If I understand it correctly, we would need to rework the integer decode logic for every binary codec to get the best out of it? It is a bit more complex to add.

We have comparison point now to see the performance impact. However, it would be easier to see if we also optimize other allocations.

How about the encoding side? I assume it should be based on the trait as well. I tried one trait based approach earlier, but there the main challenge is to provide same output type for encoding. Single to_vec call for primitives will remove all the performance benefits. Big numbers are vectors and primitives are fixed-size slices (or pointers with unsafe).

Since there is need to branch the logic based on fixed-size slice vs dynamic vector, I ended up using enum where it was easier.

I can try to continue towards that suggestion, unless @repnop has already got started.

XAMPPRocky commented 1 month ago

Yeah, that will reduce branching. If I understand it correctly, we would need to rework the integer decode logic for every binary codec to get the best out of it? It is a bit more complex to add.

Yes but it should only be changing a couple of lines in each, as AFAIK all of the codecs eventually get to the point where an integer is a bag of big endian bytes and converts it, so it would be just changing that part to use the trait.

Single to_vec call for primitives will remove all the performance benefits. Big numbers are vectors and primitives are fixed-size slices (or pointers with unsafe).

Since we're using static dispatch, we can use impl Trait here to say it returns AsRef<[u8]> and we don't have care about what type we get back, because we just care about getting a &[u8].


trait IntegerType {
    fn to_bytes_be(&self) -> impl AsRef<[u8]>;
}

Nicceboy commented 1 month ago

Yes but it should only be changing a couple of lines in each, as AFAIK all of the codecs eventually get to the point where an integer is a bag of big endian bytes and converts it, so it would be just changing that part to use the trait.

I was thinking that maybe we could avoid some of that code before that point, since the constraints are passed, and if for every codec, we define own implementation for different types, we could hardcode the ranges and format for encodings in that specific constraint. At least in OER (probably it was bad design), it always calculates everything based on the constraint ranges. So some calculations could be potentially skipped (if that is even worth it).

Since we're using static dispatch, we can use impl Trait here to say it returns AsRef<[u8]> and we don't have care about what type we get back, because we just care about getting a &[u8].

Wow, that was too easy and seems to work well. I need to read more books...

Nicceboy commented 3 weeks ago

Closing in favor of #256

librasn / rasn

Rework Integer type for speed #254