nlapier2 / Metalign

Metalign: efficient alignment-based metagenomic profiling via containment min hash
MIT License
32 stars 7 forks source link

CMash need not be run with kmer sizes down to 30 #4

Closed dkoslicki closed 4 years ago

dkoslicki commented 5 years ago

In this line, we do not need CMash to be run with 30-60-10, as we only use 60-mers in the downstream analysis.

nlapier2 commented 5 years ago

What should I replace that with? Can I just do "60" or would I have to do something like "60-60-10"? Would I also have to re-do the filter?

dkoslicki commented 4 years ago

You would not need to redo the filter. Also, 60-60-10 will work (but note that it will change the dimensions of the resulting tsv file, but this might not matter if you just take the last column in your code currently).

nlapier2 commented 4 years ago

Changing to 60-60-10 for the StreamingQueryDNADatabase call seems to change the results (even though I only took the last column originally)... does this make any sense?

dkoslicki commented 4 years ago

That's odd! It really shouldn't change things, but I would need to look into this further...

dkoslicki commented 4 years ago

This issue has now been fixed. See here for more details. Will be implemented in next release of CMash. In short, reverse complements were not being added to the ternary search trie (and a few other insidious bugs).