Closed jkbonfield closed 3 weeks ago
/usr/bin/time was reporting ~810% CPU utilisation for develop and ~940% for this PR, demonstrating the reduced time in main thread.
Commenting out the sanity checks in bam_read1
for if (c->n_cigar > 0)
onwards increases that CPU utilisation to 1060% (1120% if I get rid of the tag2cigar call), leading to 37s real time. This shows that there is room for improvement, as these sort of checks ought to be executed in the multi-threaded decode instead. We thread the bgzf_read and not the bam_read.
This is why multi-threaded decoding of SAM and SAM.gz is sometimes more performant than BAM, despite the string->integer conversion overheads, as SAM parsing is entirely threadable.
Added a couple of extra tests...
The bgzf_read function is long and not something that's likely to get inlined even if it was in the header.
However most of the time our calls request a small amount of data and they fit within the buffer we've read, so we offer a static inline to do the memcpy when we can, falling back to the long function when we cannot.
In terms of CPU time it's not much difference, but the key thing is that it's often CPU time saved in a main thread given the bulk of the decode is often threaded. An example of test_view -B -@16
develop:
This PR:
So we can see it's a consistent win when threading, potentially 10-15% faster throughput depending on work loads.