iqbal-lab-org / gramtools

Genome inference from a population reference graph
MIT License
93 stars 15 forks source link

Precalculate some/all ranks to speed up mapping #1

Closed iqbal-lab closed 7 years ago

iqbal-lab commented 9 years ago

This from Sorina

"I just realised something that could potentially speed up the mapping. Those ranks you need for jumping from the last column to the first column in the Burrows Wheeler matrix. In the standard FM-index algorithm they don’t compute them right at the time when you’re mapping each character from each read, because you’d have to compute the same thing multiple times, every time a read covers that region. Instead, what they do is precalculate all ranks for all letters and store them. Because this takes up quite a lot of memory, they actually store only a fraction of the rows and infer the gaps from the nearest stored value. I thought that’s what the sdsl guys are doing as well when I’m calling the rank function, they’re just accessing the structure that stores the ranks. But no, I just digged through their code and they’re actually calculating the rank on the spot. So I think it’s worth trying this out."

iqbal-lab commented 9 years ago

I think good idea - do after first implementation running and profiled.

iqbal-lab commented 7 years ago

Can we close @sm0179 ?