perf: investigate using jemalloc

Just did some quick performance tests, and it seems like overall jemalloc would be an improvement in our current benchmark suite.

Changes for benchmark

In `Cargo.toml`: ```toml # Allocators mimalloc = { version = "0.1.39", optional = true } jemallocator = { version = "0.5.4", optional = true, features = ["disable_initial_exec_tls"] } snmalloc-rs = { version = "0.3.4", optional = true, features = ["local_dynamic_tls"] } ``` In `python/src/lib.rs`: ```rust // Set global allocator #[cfg(feature = "mimalloc")] #[global_allocator] static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc; #[cfg(feature = "snmalloc-rs")] #[global_allocator] static GLOBAL: snmalloc_rs::SnMalloc = snmalloc_rs::SnMalloc; #[cfg(feature = "jemallocator")] #[global_allocator] static GLOBAL: jemallocator::Jemalloc = jemallocator::Jemalloc; ```

On MacOS, I find jemalloc is good in almost all situations, except for some regressions in writes:

Screenshot 2023-11-11 at 11 46 57 AM

On Linux, it's also generally good though there are some regressions for scans, which might be worth looking into:

Screenshot 2023-11-11 at 11 59 52 AM

https://docs.google.com/spreadsheets/d/1hfPkVX0bkZTcWm1jSBdxWYl5zo24F2YfPqh98NYYY0w/edit#gid=1933822679

Overall, I'm inclined to wait until we understand why the regressions on Linux scans occurs.

lancedb / lance

perf: investigate using jemalloc #1372