nf-core / ampliseq

Amplicon sequencing analysis workflow using DADA2 and QIIME2
https://nf-co.re/ampliseq
MIT License
186 stars 117 forks source link

Add additional Pacbio Processing Path #106

Closed apeltzer closed 3 years ago

apeltzer commented 4 years ago

We had discussions with Anders @andand, Daniel @erikrikarddaniel and Jeanette @jtangrot adding this on the Stockholm hackathon.

It would be an additional path to the existing ampliseq workflow for a future release to add possibilities that are currently missing.

apeltzer commented 4 years ago

Adding in @d4straub to get him aboard too :-)

erikrikarddaniel commented 4 years ago

We have discussed how to technically integrate this, and lean towards writing one or more R scripts that do long read denoising. The idea would be to run this as part of the workflow, after primer removal with cutadapt, when users specify PacBio or similar, instead of the normal QIIME2 processing. At the end of the script, a QIIME2 artefact would be output and the rest of the workflow could continue. Probably, this would be just before taxonomy assignment.

(Primer removal might be somewhat different, since the PacBio apparently contain sequences upstream of the primer (both forward and reverse) that should also be deleted. A flag to cutadapt, I suppose.)

We will start working on the R script(s) after Christmas and after we agree on a plan. We're all happy to discuss!

(Adding @DiegoBrambilla too.)

d4straub commented 4 years ago

Probably dada2:::removePrimers is better suited than cutadapt. This funcation can also immediately re-orient PacBio reads. e.g. dada2:::removePrimers(rawFILES[i], trimmedFILES[i], primer.fwd=primerFW, primer.rev=dada2:::rc(primerRV), orient=TRUE)

nbargues commented 4 years ago

I'm currently working on a workflow for analyse ONT full-lenght 16S sequences using QIIMe2, based on this project [https://github.com/DeniRibicic/q2ONT]. Are you considering adding Nanopore data to the processing path ?

d4straub commented 4 years ago

Hi @nbargues , this looks interesting. ~Yes, Nanopore processing should be included one day. As far as I can see you are using vsearch but dada2 is planned for this pipeline.~ Sorry, I am wrong, that's not planned at the moment.

nbargues commented 4 years ago

@d4straub Thanks for the response. I read that nanopore data is not supported by DADA2 currently and neither is the other denoising method Deblur. That's why vsearch is used.

d4straub commented 4 years ago

@nbargues Oh! I am so sorry, I somehow mixed PacBio and Nanopore!! You are right, DADA2 only supports Illumina and PacBio but not Nanopore. And I have to correct myself, Nanopore is currently not meant to be included. I'll edit the comment above so that nobody else is deceived!

jtangrot commented 4 years ago

Probably dada2:::removePrimers is better suited than cutadapt. This funcation can also immediately re-orient PacBio reads. e.g. dada2:::removePrimers(rawFILES[i], trimmedFILES[i], primer.fwd=primerFW, primer.rev=dada2:::rc(primerRV), orient=TRUE)

In our experience, cutadapt does a better job recognising the primers, but maybe that's not what you have seen? Anyway, using the (relatively new) option --rc cutadapt too can re-orient the reads during the primer removal process.

d4straub commented 4 years ago

Thanks for this remark. I did not specifically compare performance of dada2:::removePrimers with cutadapt. I have also have not used yet cutadapt's -rc, seems like a valid solution as well.

jtangrot commented 4 years ago

What was maybe not clear in the discussions on the Stockholm hackathon was that we also would like to add support for ITS (which is the current use case we have for PacBio data). Should a separate issue be opened for that?

d4straub commented 4 years ago

I have close to no experience with analyzing ITS sequences. Would a processing path be very different from 16S to ITS analysis with DADA2 (except the taxonomic database, obviously)? If yes, than it would be definitely worth it to open a separate issue.

d4straub commented 3 years ago

This was solved in #168, thanks @jtangrot!