Closed tshauck closed 11 months ago
While this performance difference can be observed in how your benchmarks are defined, i.e., creating a new buffer for each iteration; in practice, there shouldn't be a difference with the current BAM decoder. The CIGAR buffer in a sam::alignment::Record
is reused for each record, and Vec::clear
does not affect its internal capacity. (Also note that Vec::reserve
is for an additional number of entries, not for the expected count unless the capacity is already 0.)
I noticed while profiling a tool of mine that calls to
get_cigar
may have do n allocations based on the number of cigar ops.
Can you share this report? When a Vec
is empty, the first push allocates for a minimum of 4 items, and when at capacity, the size of the Vec
doubles. E.g., there will be an allocation at item 1 (capacity 4), 5 (capacity 8), 9 (capacity 16), etc.
Thanks for the reply. I should've added a little more color to my use case. I'll have a closer look tonight and follow up, but I think the lack of reuse is due to me working with a lazy BAM record, then calling let cigar: Cigar = record.cigar().try_into()?;
to convert the lazy cigar into a regular one, and I think TryFrom
does create a new Cigar.
Edit: If reserve
is to be included, perhaps moving it into try_from
might make more sense?
Ah, I see. Since this only affects the conversion use case, can we use the ops iterator instead of the decoder in the implementation of lazy::record::Cigar::try_from
, i.e., call self.iter().collect()
? The iterator has the correct size hint set, which preallocates the resulting Vec
.
Cool, thanks for the feedback. Please let me know if the update is what you had in mind.
Thanks! (And sorry for the slow responses; I just got back from vacation.)
All good -- appreciate the education/patience
Hi,
I'd like to propose a PR that I think could improve BAM cigar parsing performance. I noticed while profiling a tool of mine that calls to
get_cigar
may have do n allocations based on the number of cigar ops. Looking at theget_cigar
function, it already takes the number of ops, so this PR adds areserve
method onCigar
so thatget_cigar
can make use of it.I did a little benchmark via criterion that seems to indicate reduced runtime and reduced runtime variance.
I put the (slightly overcomplicated 😅) benchmark code in a separate branch here: https://github.com/tshauck/noodles/tree/add-reserve-to-cigar-decode-benches/target/criterion (with some changes to
noodles-bam
too).