kevlar-dev / kevlar

Reference-free variant discovery in large eukaryotic genomes
https://kevlar.readthedocs.io
MIT License
41 stars 9 forks source link

Compose control counttables for `kevlar novel` step #275

Open standage opened 6 years ago

standage commented 6 years ago

After counting k-mers for each control sample, we should investigate composing the counttables into a single nodetable before running kevlar novel. This should a couple of synergistic benefits.

The cost is, of course, another pass over the "data". But it should be possible to build a nodetable directly from the underlying counttables themselves without iterating over the reads again. So "data" should be quite small and manageable.

standage commented 6 years ago

Most of this would be implemented in khmer-land. See https://github.com/dib-lab/khmer/issues/1379 and https://github.com/dib-lab/khmer/pull/1392 for relevant threads in that project.

standage commented 6 years ago

Investigating this in https://github.com/dib-lab/khmer/pull/1874.