This is to leave a trace of upcoming retiring of a large chunk of code.
Robyn (and/or Carlos) wrote code to look for kmers overlapping variant sites and index only those (subject to a --max_read_size parameter, which is upper limit to how far to extend kmers overlapping var sites). It worked very well on simple graphs (not many clustered variants). However it suffers from:
blowup: clustered variants can cause exponential enumeration of consecutive kmers
scalability: it does not scale to PRGs with nesting, and the whole process analyses a prg string (not a graph) that cannot contain nested variants. scaling to nested PRGs requires changing almost all the code.
code complexity: it is too complex to be easily changed in current form
For those reasons we enumerate all kmers of a given size regardless of the prg under study. This is brute force and unfortunate when the prg contains, for eg, a single SNP. Nonetheless, (in my opinion) because of code complexity, the code needs a full rewrite, so I will remove the code.
This is to leave a trace of upcoming retiring of a large chunk of code.
Robyn (and/or Carlos) wrote code to look for kmers overlapping variant sites and index only those (subject to a
--max_read_size
parameter, which is upper limit to how far to extend kmers overlapping var sites). It worked very well on simple graphs (not many clustered variants). However it suffers from:For those reasons we enumerate all kmers of a given size regardless of the prg under study. This is brute force and unfortunate when the prg contains, for eg, a single SNP. Nonetheless, (in my opinion) because of code complexity, the code needs a full rewrite, so I will remove the code.