Closed bwlang closed 7 months ago
this is on a macbook arm64 computer.
noodles tends to do more record validation and tries to ensure data is spec compliant. However, in your flamegraphs, it seems the DEFLATE implementation htslib linked to is using SIMD intrinsics (see inflate_fast_neon
and longest_match_neon
) on your CPU. By default, noodles uses the pure Rust implementation miniz_oxide, which can't compete in this case.
Can you retime your tests using libdeflate for both? I.e., cargo add noodles-bgzf --features libdeflate
for noodles and cargo add rust-htslib --features libdeflate
for rust-htslib. libdeflate is much faster for this application, regardless, and it would be better than comparing differing DEFLATE implementations.
with htslib libdeflate:
cat 0.00s user 0.01s system 1% cpu 0.574 total
sudo cargo run --release -- > tandem_reads.bam 0.45s user 0.03s system 83% cpu 0.577 total
with noodles
cat 0.00s user 0.01s system 1% cpu 0.740 total
sudo cargo run --release -- > tandem_reads.bam 0.61s user 0.05s system 89% cpu 0.743 total
much closer... thanks for the suggestion! Anything else I should try?
If you don't need owned records, you can reuse the record buffer.
let mut record = sam::alignment::Record::default();
while bam_reader.read_record(&header, &mut record)? != 0 {
// ...
}
When doing this, you may also want to use Reader::rc_records
in your rust-htslib test for a more fair comparison.
noodles 0.61.0 now allows alignment format records to read and written, but I'm not sure if this will help in a passthrough case like in your tests. Feel free to open a new discussion if you have any further questions.
rust-htslib:
noodles:
i first implemented with htslib, then tried switching to noodles with no architectural changes.
for noodles i'm opening like this:
for htlslib like this:
The internal filtering logic is the same , reasoning about flags insert size , etc - writing out some records.
Any suggestions?