librasn / rasn

A Safe #[no_std] ASN.1 Codec Framework
Other
183 stars 43 forks source link

DER decoding of empty sequences is incredibly slow #224

Closed knecht closed 4 months ago

knecht commented 4 months ago

Decoding even a single empty sequence sometimes takes 100-200 ms. Doing that on more than one thread concurrently, it gets even slower and most threads are waiting although they should be working. Looks as if some nasty locks were involved. Example with more empty sequences in one DER blob:

decoding 1x 206 bytes took 194 ms
decoding 100x 206 bytes took 6260 ms
now decoding on 4 threads, 100x 206 bytes on each
decoding 100x 206 bytes took 31488 ms
decoding 100x 206 bytes took 31881 ms
decoding 100x 206 bytes took 32242 ms
decoding 100x 206 bytes took 32245 ms

this is the code producing the output above:

use std::time;

use rasn::{AsnType, Decode, Encode};

#[derive(AsnType, Encode, Decode, Debug)]
struct Person {
    #[rasn(tag(context, 0))] // or just #[rasn(tag(0))]
    age: Option<String>,
    name: Option<String>,
}

#[derive(AsnType, Encode, Decode, Debug)]
struct Group {
    members: Vec<Person>,
}

fn main() {
    let data = hex::decode("3081cb3081c83000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000300030003000").unwrap();
    let len = data.len();
    let test = |times: u64| {
        let t = time::Instant::now();
        for _ in 0..times {
            rasn::der::decode::<Group>(&data).unwrap();
        }
        let dt = std::time::Instant::now().duration_since(t).as_millis();
        println!("decoding {times}x {len} bytes took {dt} ms");
    };
    test(1);
    test(100);

    println!("now decoding on 4 threads, 100x {len} bytes on each");
    std::thread::scope(|s| {
        let mut join_handles = Vec::new();
        for _ in 0..4 {
            join_handles.push(s.spawn(|| {
                test(100);
            }));
        }
        for jh in join_handles {
            jh.join().unwrap();
        }
    });
}
XAMPPRocky commented 4 months ago

Thank you for your issue! Yeah performance hasn't been a focus as the focus has been on correctness, I reckon this can be improved significantly by improving how parse_encoded_contents to better reduce work done.

knecht commented 4 months ago

It might have to do with parsing empty slices which results in errors. Both gdb and flamegraph show a lot of backtrace gathering related stack frames. Although no backtrace becomes visible, it looks as if they are being produced and discarded unless the occurrence of these frame is simply a result of the debugging procedure.

XAMPPRocky commented 4 months ago

Yeah, I reckon that's also it, we could probably make it better by making the loop internal to parse_encoded_contents, that way we can skip calling it entirely if there's zero length.

XAMPPRocky commented 4 months ago

I've fixed this with the above commit, this is the output of the above code post that commit. FYI in the future, it would be great if this was written with criterion or iai because then we can commit it to our benches. Thank you for your issue!

decoding 1x 206 bytes took 0 ms
decoding 100x 206 bytes took 6 ms
now decoding on 4 threads, 100x 206 bytes on each
decoding 100x 206 bytes took 10 ms
decoding 100x 206 bytes took 10 ms
decoding 100x 206 bytes took 10 ms
decoding 100x 206 bytes took 11 ms