imartayan / CBL

A Rust library providing fully dynamic sets of k-mers with high locality
MIT License
43 stars 0 forks source link

Discussion / question #1

Open rob-p opened 8 months ago

rob-p commented 8 months ago

Hi @imartayan,

Very exciting work; congratulations on this! This isn't an issue per-se, but rather a discussion point that I wanted to raise (I raised a similar one in the ggcat repo a while back).

While I understand the desire to make our own lives (as developers) as easy as possible, I wonder if you might be able to enumerate what specific nightly features are required by CBL, and what prevents it from building on the latest stable rust (or at least beta).

Jon Gjengset — one of my favorite Rustaceans — has an excellent talk (relevant part linked here) about the tradeoffs of relying on nightly features and why it may, much of the time, just not be worthwhile. In particular, I'm curious what would be required to build on stable (or beta), and what particular features are being used. Features that are slated for stabilization mean it's just a matter of time — few release cycles — until those are on stable. But some nightly features may never make it to stable, or be removed or abandoned (or be unsound 😱), and may be worth replacing with something else, or a stable crate that emulates their behavior.

Anyway, I just wanted to kick off this discussion with you to get your thoughts and feedback. Congrats again!

--Rob

imartayan commented 8 months ago

Hi @rob-p, thanks for the kind words!

That's a fair point, and I agree that using nightly features is not ideal. CBL currently relies on two nightly features: the chunk_by method for slices (previously known as group_by) and generic_const_exprs.

chunk_by has just been stabilized in 1.77 (see https://github.com/rust-lang/rust/issues/80552), so it's only a matter of time before it hits the stable channel (21 March 2024 according to releases.rs).

On the other hand, generic_const_exprs is nowhere near stabilization (see https://github.com/rust-lang/rust/issues/76560) and may be a bit more difficult to replace. I'm using this feature to compute the exact size (in bits) of the words I'm working with at compile time. I use it to build specialized tries for words of fixed length, and it also lets the compiler optimize some parts of the code. I'm not sure how to replace this feature yet, one way might be to compute some of these parameters at runtime but this might have a significant impact on performance.

One last thing, I was considering using the portable_simd feature (which is also nightly) to use SIMD operations without having to write architecture specific code, do you have any recommendations for a stable crate with similar features?

rob-p commented 8 months ago

Hi @imartayan,

Thanks for the detailed and thoughtful reply! Yes, the use of chunk_by should be no issue at all. Actually, that may already be in beta, and rust pretty much always makes it's release dates, so it's just around the corner.

Regarding portable SIMD, the folks to ask there would be @Daniel-Liu-c0deb0t and @RagnarGrootKoerkamp!

Regarding the generic_const_exprs, it might be worth looking at where they actually show up in the code and asking if they could be replaced by const_generics (MVP) the stable part, plus, maybe, some macros? Perhaps as we played around with a bit here, but maybe more complex given your needs? I'd be happy to talk through some of that if it makes sense!

Best, Rob

RagnarGrootKoerkamp commented 8 months ago

Re simd, that's also one of the big reasons A*PA uses unstable. There actually was a small backwards incompatible change recently which was kinda annoying. But it's so convenient to use that I don't really see myself switching to stable intrinsics.

rob-p commented 8 months ago

Also worth noting the decision isn't binary. Even if one needs unstable, the fewer unstable features one relies upon (and the more likely those are to be stabilized soon) the better :).

rob-p commented 8 months ago

Surprised honestly there isn't something like SIMDe for Rust yet.