This function is about 7x faster than before, which speeds up low-depth CRAM decoding by around 10% of so. Obviously the time spent in this function is constantly regardless of depth, so the deeper the data the less important the speed of this becomes.
The two main improvements are:
Drop toupper_c and replace it with "c & ~0x20". This works for ASCII, and we already have far too many places with char lookup tables e.g. converting ACGT to 0123 that we're not going to work on mythical EBDIC systems anyway.
Remove the continuous white-space check. We exploit the knowledge that the FASTA format must have white-space only at the end of lines. The fai index can't work if this isn't true and I've already tested that samtools faidx fails to query correctly if we have whitespace elsewhere.
Some benchmarks are below. I can't explain why mmap is being slow on this system (seq4c). It's not what I've observed before, where mmap is normally the fastest way to load the reference.
10 million records 9827_1#49 at a mean depth of ~1.75x Cram_io.c built with clang10 (although rest was probably system gcc).
This function is about 7x faster than before, which speeds up low-depth CRAM decoding by around 10% of so. Obviously the time spent in this function is constantly regardless of depth, so the deeper the data the less important the speed of this becomes.
The two main improvements are:
Drop toupper_c and replace it with "c & ~0x20". This works for ASCII, and we already have far too many places with char lookup tables e.g. converting ACGT to 0123 that we're not going to work on mythical EBDIC systems anyway.
Remove the continuous white-space check. We exploit the knowledge that the FASTA format must have white-space only at the end of lines. The fai index can't work if this isn't true and I've already tested that samtools faidx fails to query correctly if we have whitespace elsewhere.
Some benchmarks are below. I can't explain why mmap is being slow on this system (seq4c). It's not what I've observed before, where mmap is normally the fastest way to load the reference.
10 million records 9827_1#49 at a mean depth of ~1.75x Cram_io.c built with clang10 (although rest was probably system gcc).
Reference via mmap is:
Not sure what "_etext" is, but it's a significant CPU portion.
Develop branch:
"_etext" plummets, so it's something related to the mmap, but it's been replaced by a heavy load_ref_portion instead.
Old dev loop, but using &~0x20 instead of toupper_c. ./test/testview -i reference=$HREF -B /tmp/.cram
load_ref_portion dropped from 6482 to 2699.
New loop construction (this PR):
load_ref_portion dropped again from 2699 to 951.