iqbal-lab-org / gramtools

Genome inference from a population reference graph
MIT License
92 stars 15 forks source link

Optimize site marker mask memory use #97

Closed iqbal-lab closed 3 years ago

iqbal-lab commented 6 years ago

Logging this idea here:

The mask only needs 2 characters to mark site and allele boundaries, plus rank to track which site you are in. The PRG needs the actual site numbers to ensure alleles sort together. But the mask does not. So you can use 3 bit encoding

bricoletc commented 4 years ago

From https://github.com/iqbal-lab-org/gramtools/releases/tag/v1.6.0 we no longer use site and allele masks on the PRG string.

They still can be built in the kmer indexing code which is currently deprecated. Once we go for release 1.8 (https://github.com/iqbal-lab-org/gramtools/projects/2) we can think about whether to close this

bricoletc commented 3 years ago

Now we have nesting in v1.7.0, the site mask will no longer be used.