oxc-project / oxc-browserslist

Rust port of browserslist
https://docs.rs/oxc-browserslist
MIT License
23 stars 1 forks source link

Pre-compile data as binary blobs #23

Open overlookmotel opened 4 months ago

overlookmotel commented 4 months ago

Just to expand on discussion we had earlier...

In my opinion, it is never going to be possible to make this crate as fast to compile as we want it to be, no matter what we throw at it, without taking a different approach. It's just tons of data, so using a codegen to generate tons of code and then asking rust to parse and compile it all is always going to take a long time.

A possible solution could be to pre-compile it as binary data. Something like this:

At build time:

At runtime:

Background: rkyv's "special sauce" is relative pointers: https://rkyv.org/architecture/relative-pointers.html

Boshen commented 4 months ago

I managed to shrink the code by "surface area" in https://github.com/oxc-project/oxc-browserslist/pull/32/files, compile time is halved from 8s to 4s.

I'll stop optimizing as I need to do some real work ...

Boshen commented 4 months ago

While cleaning up criterion2, I discovered https://crates.io/crates/ciborium which may help us here.

Boshen commented 4 months ago

For context, these are the two files that slows down compilation: https://github.com/oxc-project/oxc-browserslist/blob/main/src/generated/caniuse_feature_matching.rs and https://github.com/oxc-project/oxc-browserslist/blob/main/src/generated/caniuse_region_matching.rs

Boshen commented 4 months ago

I counldn't find it, but we can reduce a lot of chars if we can code generate a raw string to remove all the escaped double quotes ... r#"huge string without escaped quotes"#

overlookmotel commented 4 months ago

ciborium looks good for compact data representation. However, it has a deserialization step which might be quite costly at runtime.

rkyv's advantage is no deserialization - it stores the data in a form where it can just be loaded into memory and is ready to go. You can load it statically with a zero-cost transmute:

static DATA: &Data = {
    #[repr(C)] // Guarantee 'bytes' comes after '_align'
    struct Aligned<Bytes: ?Sized> {
        _align: [Data; 0],
        bytes: Bytes,
    }

    static ALIGNED: &Aligned<[u8]> =
        &Aligned { _align: [], bytes: *include_bytes!("./data.bin") };
    unsafe { &*(ALIGNED as *const _ as *const Data) }
};

(filtched this code from https://users.rust-lang.org/t/can-i-conveniently-compile-bytes-into-a-rust-program-with-a-specific-alignment/24049/2)

Boshen commented 4 months ago

I've labeled this "good first issue" if anyone wants to try and reduce the compilation speed of this crate.

The current bottleneck comes from these two files where the data are huge: https://github.com/oxc-project/oxc-browserslist/blob/main/src/generated/caniuse_feature_matching.rs and https://github.com/oxc-project/oxc-browserslist/blob/main/src/generated/caniuse_region_matching.rs

The data is generated from cargo codegen