iqbal-lab-org / gramtools

Genome inference from a population reference graph
MIT License
92 stars 15 forks source link

Retire variant-aware kmer indexing code #159

Closed bricoletc closed 1 year ago

bricoletc commented 2 years ago

This is to leave a trace of upcoming retiring of a large chunk of code.

Robyn (and/or Carlos) wrote code to look for kmers overlapping variant sites and index only those (subject to a --max_read_size parameter, which is upper limit to how far to extend kmers overlapping var sites). It worked very well on simple graphs (not many clustered variants). However it suffers from:

For those reasons we enumerate all kmers of a given size regardless of the prg under study. This is brute force and unfortunate when the prg contains, for eg, a single SNP. Nonetheless, (in my opinion) because of code complexity, the code needs a full rewrite, so I will remove the code.