Generate Rust bindings to Intel GKL for SmithWatermanAligner and PairHMM

rhysnewell commented 2 years ago

This is a non-trivial task but I think might be possible. The GATK makes use of some specially designed C libraries made by intel which greatly improves the performance of the SmithWatermanAligner and PairHMM classes. It seems like it is possible to make calls to native C libraries in Rust but there is a bit of work involved, see: https://medium.com/dwelo-r-d/using-c-libraries-in-rust-13961948c72a and https://doc.rust-lang.org/nomicon/ffi.html

I doubt I have the time to work on this at the moment, and Lorikeet is already faster than the GATK... BUT it could be faster. So if anyone sees this and has experience with this sort of thing, then your help would be much appreciated.

rhysnewell commented 2 years ago

I've made this post on reddit: https://www.reddit.com/r/rust/comments/qhaaju/help_wanted_opensource_genomics_project_in_rust/

A lot of people have provided some great feedback and suggestions. Additionally, some users have pointed out what feasibility of getting the GKL library to work with Rust is. The problem with it is that it is written in JNI, which is not something you can directly use bindgen on and get Rust to use. But one user (brand_x) pointed out the follow

It's not quite so bad as it looked. While this project is, indeed, straight JNI (as in, there's no cleanly separated C library to point bindgen at), it looks like the separation is actually fairly clean within the project. All the files named Intel* are jni binding, and they happen to nicely document what would need to be exported into pure C interfaces to be fed to bindgen. Everything else appears to be pure C++.

The two algorithms mentioned have fairly clean interfaces in their (non-JNI) C++ header files, and it might be sufficient to edit the project files to exclude the JNI, bindgen those headers, and go for it.

Additionally, dannohung pointed out that this pull request https://github.com/Intel-HLS/GKL/pull/91 seems to have done a lot of the hard work already!

philipc commented 2 years ago

I've created a simple binding for the pairhmm likelihood at https://github.com/philipc/gkl-rs. It contains a copy of the Intel GKL code, rather than trying to adapt its build system. Not published and no documentation yet, but I'm sure you can figure it out. See the tests for example usage. Let me know if this is useful to you, and I can do the SmithWatermanAligner too.

philipc commented 2 years ago

I've also been contemplating converting the code to rust, mostly as an experiment with using SIMD in rust. Not sure if that would be useful to you too. I'm guessing you might prefer to track the upstream GKL instead.

rhysnewell commented 2 years ago

That's awesome! I'll try and implement this as soon as I can and see how it goes. I'll let you know if I have any questions

As to your second point, I'm sure there is definitely an audience of other users who would find use in you fully reimplementing it rust. I look forward to watching your progress on it if you get around to it

rhysnewell commented 2 years ago

Hi @philipc,

I think I've got the pairhmm refactor mostly working but I've just come across a strange issue I was hoping you could help me out with. The likelihood values returned by gkl-rs seem to be way off when I use it lorikeet, but the actual tests for gkl-rs pass perfectly fine when I run them myself. I've added in a new unit test that displays this issue for me (https://github.com/rhysnewell/Lorikeet/blob/gkl/tests/vector_pair_hmm_unit_tests.rs), as you can see the start of it is pretty much the same as the test you've got set up but the value the forward function returns are way off from the expected result.

The section I'm talking about looks like this:

        let avx_function = forward().unwrap();
        let result = avx_function(
            hap_bases,
            bases,
            base_quals,
            insertion_quals,
            deletion_quals,
            gcp
        );

        println!("Direct result {}", result);
        // This should work but doesn't??
        assert!((result - expected_result).abs() < 1e-5);

Am I missing something here? Is there perhaps a backend library that I need to update for gkl to give the correct result?

Cheers, Rhys

lnicola commented 2 years ago

@rhysnewell can you try cargo miri test? You'll need to run rustup component add miri first.

It will run the unit tests under a checker that should spot any issues in unsafe Rust (but not C++) code.

philipc commented 2 years ago

There was some missing initialisation, I've pushed a fix for that.

I briefly tried running miri, but it complains about file I/O and I didn't find the correct invocation yet.

lnicola commented 2 years ago

For the record, I've used:

MIRIFLAGS="-Zmiri-disable-isolation" cargo +nightly miri test --release

philipc commented 2 years ago

Thanks, that gets miri to run, but I suspect that it doesn't check anything because the AVX target features won't be enabled?

lnicola commented 2 years ago

Yeah, sorry, I suppose it's not ready yet:

     Running tests/pairhmm.rs (target/miri/x86_64-unknown-linux-gnu/release/deps/pairhmm-244940cddc3aaaa9)

running 2 tests
test data_file ... error: unsupported operation: `extern` static DefId(18:21 ~ gkl[f7cc]::pairhmm::::ConvertChar_conversionTable) is not supported by Miri
   --> /home/grayshade/gkl-rs/src/pairhmm.rs:30:9
    |
30  |         ConvertChar_conversionTable[b'A' as usize] = 0;
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^ `extern` static DefId(18:21 ~ gkl[f7cc]::pairhmm::::ConvertChar_conversionTable) is not supported by Miri
    |
    = help: this is likely not a bug in the program; it indicates that the program performed an operation that the interpreter does not support

    = note: inside `gkl::pairhmm::convert_char_init` at /home/grayshade/gkl-rs/src/pairhmm.rs:30:9
    = note: inside `gkl::pairhmm::forward_f32_avx` at /home/grayshade/gkl-rs/src/pairhmm.rs:64:5

rhysnewell commented 2 years ago

@philipc Looks like that update fixed this issue! I'll post some new benchmarks in the morning with avx enabled.

The smith waterman stuff looks almost done as well but are you still working on that or should I try implementing that as well?

philipc commented 2 years ago

I think the smith waterman is done. It doesn't have good tests yet though (GKL doesn't either). Also, let me know if anything can be done to make the API simpler for you (I'm not sure about the whole thing with returning a function pointer).

rhysnewell commented 2 years ago

Hey @philipc,

Sorry this took longer to get done than expected due to some server issues on our end.

I've got the smith waterman implemented and ran my unit tests using the AVX version and they were all green, so looks like it's good to use! As for the API, I'm happy with it although it does feel a bit odd to be unwrapping a pointer to a function but I think it is actually the best way to handle it. I like that it is nice and simple, just a couple of functions that slot neatly into what I've already got going.

So now for the benchmarks! Things look great, I'm really happy with your work here:

speed_benchmarks_combined_no_instrain

These are two different benchmarks on (A) a low depth sample, and (B) a high depth sample. The AVX version is about 3 times faster than the logless caching version, and maybe about 6 times faster than GATK AVX which is great. Additionally, the CPU usage is decreased dramatically especially on that low depth sample. RAM goes down as well which is nice to see. Another benchmark I ran that isn't included here is even more impressive, something that potentially was going to take days to complete managed to finish up in 1 hour and 20 minutes. So way more user friendly now!

I think I'll close this issue now as you have solved the issue. Would you be able to send me through your details to rhys.newell@hdr.qut.edu.au and we'll make sure to include you as an author.

Thanks once again, brilliant stuff Cheers, Rhys

philipc commented 2 years ago

I've got a rough rust implementation done for PairHMM, and it is significantly faster than the C version (25-35% improvement depending on the vector type). I expect this improvement is due to implementation differences that could probably be applied to the C version too.

I think the main drawback of the rust version is that AVX-512 support still requires nightly rust. Is this something that you would be interested in using? Maybe use a cargo feature to enable it in gkl-rs?

rhysnewell commented 2 years ago

That sounds great! I'd definitely be interested in a pure rust implementation.

That's a bit of a bummer about nightly, but kind of expected as well. I think a cargo feature would be the best way to go and then it would be super easy to make sure that worked with lorikeet

rhysnewell commented 2 years ago

Hi @philipc , Just wanted to check in and see how things were progressing with the rust implementations of these algorithms. I've poked around the GitHub and things seem fairly complete imo, but just wondering if you needed some help closing it out or something?

Cheers, Rhys

philipc commented 2 years ago

The main thing I still want to do is to get it all working on stable rust. This will need the inline asm support in rust 1.59, which releases on Feb 24. This is required for the AVX-512 support, which doesn't have stabilized instrinsics, so we'll need to use inline asm instead. I still need to check that the inline asm performance is just as good though. There are also a couple of llvm optimization bugs which I concluded needed to be worked around with inline asm. I'd also like to do a NEON implementation but that's not important right now.

rhysnewell commented 2 years ago

Ah that makes sense, I'm happy to continue waiting for stable rust so no rush. That sounds good to me, I'm sure the performance should be on par. Even if it isn't, having it all in pure stable rust is pretty nice bonus in my opinion. Speaking of bugs, I've opened up an issue on gkl-rs regarding some maybe incorrect alignments being produced by the smith-waterman aligner that you may or not already be aware of.

I don't think I know what NEON is, what is it used for?

philipc commented 2 years ago

NEON is the ARM SIMD instruction set. I don't know how commonly ARM is used for HPC.

philipc commented 2 years ago

I was mistaken. Inline asm isn't enough to avoid needing some AVX-512 stabilisation too. We need the target features (avx512f etc) and some of the types (__m512 etc). Not sure where to go with that. Do you want to use nightly rust?

rhysnewell commented 2 years ago

I'd probably want to avoid the use of nightly for Lorikeet at the moment as it would make packaging the software through conda a lot more difficult for me I think. I'm happy to keep using gkl 0.1.1 for now though and you can switch gkl over to nightly so other people can make use of the fastest version, but that's up to you

rhysnewell / Lorikeet

Generate Rust bindings to Intel GKL for SmithWatermanAligner and PairHMM #21