perf: don't heap allocate a BigInt for every integer that is decoded

repnop commented 3 weeks ago

there's some outstanding concerns around behavior, though all of the tests do pass. mainly, with the wrapping_add in the PER parse_integer method, and with the as casts in the integer type IntegerType trait impls as well, but it seems to be working assuming the tests cover enough ground. current preliminary benchmarks seem promising (after modifying the common.rs file in main to include integer benchmarking), though maybe not quite as significant as I had hoped, but I can perform more performance analysis at some point in the future that specifically looks at the integer decoding alone.

my laptop info for this benchmark run:

OS: Arch Linux x86_64
Host: Laptop 13 (AMD Ryzen 7040Series) A7
Kernel: 6.9.3-arch1-1
CPU: AMD Ryzen 7 7840U w/ Radeon 780M Graphics (16) @ 5.1
Memory: 16050MiB / 27868MiB

before

ber/decode              time:   [1.5740 µs 1.5848 µs 1.5964 µs]
                        change: [-0.2673% +1.2938% +2.9768%] (p = 0.11 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

der/decode              time:   [1.5334 µs 1.5440 µs 1.5550 µs]
                        change: [-2.8790% -0.8514% +0.6799%] (p = 0.41 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

cer/decode              time:   [6.2658 µs 6.2982 µs 6.3305 µs]
                        change: [-0.4698% +0.6365% +2.0601%] (p = 0.34 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

uper/decode             time:   [5.0579 µs 5.0919 µs 5.1279 µs]
                        change: [-2.4254% -0.8317% +0.6502%] (p = 0.30 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

after

ber/decode              time:   [1.3727 µs 1.3816 µs 1.3915 µs]
                        change: [-15.479% -14.106% -12.900%] (p = 0.00 < 0.05)
                        Performance has improved.

der/decode              time:   [1.3505 µs 1.3587 µs 1.3676 µs]
                        change: [-12.842% -11.670% -10.479%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe

cer/decode              time:   [5.6029 µs 5.6373 µs 5.6722 µs]
                        change: [-11.062% -9.8772% -8.7710%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

uper/decode             time:   [4.6751 µs 4.7029 µs 4.7329 µs]
                        change: [-8.7410% -7.6456% -6.5511%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

repnop commented 3 weeks ago

it was unrelated, but while investigating failing tests, I noticed a bug in the DecodeError display impl which accidentally printed the self.kind instead of self.codec for the Codec: message, so I included that here as well.

Nicceboy commented 3 weeks ago

Nice! The performance boost seems to be similar compared what I was doing. I am not sure if it is easy to get faster from integers alone, without changing the logic how constraints are handled. There are a lot of other allocations too, so in general decoding integers alone are not that visible, at least based on my debugging and benchmarking.

but I can perform more performance analysis at some point in the future that specifically looks at the integer decoding alone.

You can reuse the pure integer bench from here: https://github.com/Nicceboy/rasn/blob/integer-remake/benches/integer.rs

It is significant for Integer type (M2 Pro machine):

Thought, BER decoding seems to panic on that bench now.

repnop commented 3 weeks ago

Nice! The performance boost seems to be similar compared what I was doing. I am not sure if it is easy to get faster from integers alone, without changing the logic how constraints are handled. There are a lot of other allocations too, so in general decoding integers alone are not that visible, at least based on my debugging and benchmarking.

yeah it is kind of a microbenchmark with what I have, I'll try to recreate my original NetSNMP benchmark comparison next week sometime to see how that fairs and perhaps try to stand up something that collects some real-world performance data with SNMP devices I have access to at work to get a better idea of where things could be improved in general as well.

but I can perform more performance analysis at some point in the future that specifically looks at the integer decoding alone.

You can reuse the pure integer bench from here: https://github.com/Nicceboy/rasn/blob/integer-remake/benches/integer.rs

great to know about, thanks! I'll definitely give it a look

It is significant for Integer type (M2 Pro machine):

now those are the numbers I like to see 👀 awesome, thanks for trying it out!

XAMPPRocky commented 3 weeks ago

Thank you for your PR!

repnop commented 3 weeks ago

I think there's a bug with the decoding logic that causes a panic (which looks to be the same source as in Nicceboy's benchmark), I need to figure out the source but I'll put in a PR with that fix here shortly hopefully. not sure if its worth yanking 0.15.1 for the time being

librasn / rasn

perf: don't heap allocate a BigInt for every integer that is decoded #256