nf-core / mag

Assembly and binning of metagenomes
https://nf-co.re/mag
MIT License
191 stars 102 forks source link

Genome taxnomic classification with Sourmash #609

Open prototaxites opened 3 months ago

prototaxites commented 3 months ago

Description of feature

I was reading the SourMash documentation, and noticed it can be used for genome classification. Could be a nice one to have - there are pre-made databases available for both GTDB, as well as Genbank, which means it can work for both prokaryotes and eukaryotes without too much user intervention?

https://sourmash.readthedocs.io/en/latest/tutorials-lca.html

erikrikarddaniel commented 3 months ago

This is fundamental in #magmap. We take either a set of user-supplied genomes that is sketched, a set of ready-made sketches (e.g. GTDB) or both, then submit the selected set to mapping. Basically the whole pipeline in version 1.0, relatively close to release.

jfy133 commented 3 months ago

Ehehehe https://github.com/nf-core/taxprofiler/pull/404

jfy133 commented 3 months ago

But that PR has been abandoned... but I think this is a good case for a common subworkflow :D