iqbal-lab-org / gramtools

Genome inference from a population reference graph
MIT License
92 stars 15 forks source link

Kmer index unit test assigns an allele ID of zero to search state #139

Closed ffranr closed 5 years ago

ffranr commented 5 years ago

Unit test: https://github.com/iqbal-lab-org/gramtools/blob/ad47bb63a09c4b0b1f92f005e2059ce842223567/libgramtools/tests/kmer_index/test_build.cpp#L671

This function is called whilst running the above unit test: https://github.com/iqbal-lab-org/gramtools/blob/03ff4ff2309f25f5470816b3097d60ea7f0a9754/libgramtools/src/search/search.cpp#L336

Within the above function, a call to get_allele_id(.) returns zero.

The allele mask is generated and seems reasonable. The first step in the investigation is to generate PRG table information (SA, F, ...).

iqbal-lab commented 5 years ago

Hey @rffrancon @bricoletc when would one legitimately have discontinuous marker ids? Maybe if we broke a prg into chunks for paralleling

bricoletc commented 5 years ago

Hey @rffrancon @bricoletc when would one legitimately have discontinuous marker ids? Maybe if we broke a prg into chunks for paralleling

Could we perhaps decide to delete a variant site off an existing encoded prg? Might a user pass a discontinuous prg to the build command? (instead of reference + vcf)

iqbal-lab commented 5 years ago
  1. deleting a site is tricky, you would need to leave a ref allele or matching would go wrong there - would affect adjacent sites.
  2. i'm wondering why it would ever be justified to ever allow a discontinuous prg (i would have said veto it). the thing i can think of is partitioning a big prg for parallelising