Open Midnighter opened 1 year ago
@Midnighter Any progress on this? I'm also interested in having it for evaluation.
According to the docs sourmash tax
is the recommended approach (not sourmash lca
anymore) link. I don't have example commands to use, but there's a fairly recent nf wf that might be helpful if it's the cmds themselves is what's slowing you down here. It looks like the main steps are:
sourmash sketch
) heresourmash gather
) heresourmash tax metagenome
) heresourmash tax annotate
) herewhere steps 3 and 4 could occur in parallel.
The bioconda is up-to-date here, databases are well-described here, and the software itself is very well maintained by @ctb et al. for almost a decade now. Also including him to give an opportunity to suggest alternative cmds for generalized classification, in case the above steps are less than ideal.
I STAND READY
😆
can anyone give an example of one or two use cases so I can read the docs a bit with that in mind? would the standardize
command be a good place to start?
might be fun to add sylph support as well, since people are liking that a lot (I'm not a maintainer - that would be @bluenote-1577)
Thank you for your interest @chrisgulvik 🙂. As this is taxpasta and not the taxprofiler pipeline, the exact commands actually don't matter in this context. The only thing required from a technical perspective are examples of a few profiles created with sourmash and maybe a clear understanding what variation in terms of column output is possible/desirable/supportable.
The major impediment is my time really, as I have moved into a different job, and taxpasta is now essentially a hobby project among (several) others. We have a fairly decent guide for how to add support for new types of profiles (https://taxpasta.readthedocs.io/en/latest/contributing/supporting_new_profiler/), so if you want to give it a shot, I'm happy to provide guidance and review code.
Agreed! A sourmash subwork was actually already started on the taxprofiler repo (it's in a draft state at the moment), but the person taking that on seems to have not been able to finish it. On 'our side's we normally we add tools to taxpasta once it's in the pipeline as then we know exactly what is available etc.
That said I'm also happy to guide on the taxprofiler/nextflow side of things (I'm still on half tjme parental leave until August) , if someone wants to take over the half done subworkflow! We have a profiler -contribution guide for that too
And agreed sylph also looks very interesting 👍
I think sourmash is an interesting tool, as it is so fast in scanning vast libraries of genomes. We should add support for its output.