Closed durrantmm closed 2 years ago
Didn't test with MAGs. But I think it works, even with CDS.
2.1 Indexing KMCP efficiently builds a database from a collection of genome sequences and taxonomic information. The microbial genomes are split into ten (for archaea, bacteria, and fungi) or five (for viruses) chunks with 100-bp overlap, and the k-mer location information is further utilized in taxonomic profiling. For genomes without a single complete genome sequence, chromosomes or contigs are concatenated with intervals of k-1 bases of N to avoid introducing fake k-mers. https://www.biorxiv.org/content/10.1101/2022.03.07.482835v2
Some genomes in GTDB are draft genomes with contigs. And CDS could also be treated as contigs.
Great, thank you!
Can I expect the tool to work if I use incomplete draft genomes or MAGs as inputs? What if I use collections of CDS sequences rather than assemblies?