nf-core / taxprofiler

Highly parallelised multi-taxonomic profiling of shotgun short- and long-read metagenomic data
https://nf-co.re/taxprofiler
MIT License
105 stars 32 forks source link

Adding GetOrganelle #414

Closed erinyoung closed 3 months ago

erinyoung commented 8 months ago

Description of feature

GetOrganelle is a great tool for identifying organelles (like mitochondria).

My use case is sequencing from a mosquito pool and mitochondria can be more effective in identifying blood source.

The command for my use-case is something like

get_organelle_from_reads.py -1 forward.fq -2 reverse.fq -R 10 -k 21,45,65,85,105 -F animal_mt -o animal_mt_out   

Although GetOrganelle has more features and use cases (https://github.com/Kinggerm/GetOrganelle#recipes).

There are some nuances with this tool, though (for example, this should ideally be after host removal, but the MT sequences can't be filtered out with the host removal).

Right now, there's not an nf-core module for GetOrganelle, but I can put one together.

jfy133 commented 8 months ago

Hmmm, that's an interesting one. It half fits the scope, but I'm a little wary of the assembly bit. @nf-core/taxprofiler what do you think? (and @maxibor ?)

sofstam commented 8 months ago

I am not sure into what category this tool would fall into. It seems a bit specific to me and agree regarding the assembly part.

jfy133 commented 8 months ago

So to me basically it:

I think conceptually this would actually fit. Just rather than short-read alignment or kmer-comparison, it does 'long-read' comparison to a database (the main difference is that it generates the 'long reads' itself).

Midnighter commented 8 months ago

Given that this is the first time we see this request, maybe it'd make sense for @erinyoung to adapt the taxprofiler pipeline for their purposes as a proof-of-concept, and then we decide if/how to adopt it?

jfy133 commented 8 months ago

Given that this is the first time we see this request, maybe it'd make sense for @erinyoung to adapt the taxprofiler pipeline for their purposes as a proof-of-concept, and then we decide if/how to adopt it?

What do you mean by PoC - as in make a fork, add it, and see if it makes sense?

I think conceptually it does what we want (I just need to check the output), it's just outside our typical direct kmer/alignment of reads concept

jfy133 commented 8 months ago

I just had a quick look: @erinyoung does the tool at all produce a OTU/taxon like table as output at all? I tried to look through and couldn't find anything like that. The closest thing to a table was listing gene loci rather than species

Midnighter commented 8 months ago

What do you mean by PoC - as in make a fork, add it, and see if it makes sense?

As PoC, I meant to add the modules and make data flow adjustments needed to get the pipeline to work as needed for the purpose, yes.

erinyoung commented 7 months ago

I created a nf-core module for getOrganelle (https://github.com/nf-core/modules/pull/4484). The output is a fasta file with either complete or partial organelle/plasmidome sequences.

jfy133 commented 7 months ago

Thanks @erinyoung !

So if the output of the module is simply fasta files, I don't consider that in scope for taxprofiler - as that means it is simply just an assembler.

However I saw there is this utility function: https://github.com/Kinggerm/GetOrganelle/wiki/Usage#summary_get_organelle_outputpy

Depending on what the output of that looks like, this may sort of make it fit.

erinyoung commented 3 months ago

My apologies, but I've encountered other priorities. I may get back into the issue at a later, but am closing this for now.