Closed apeltzer closed 3 years ago
Adding in @d4straub to get him aboard too :-)
We have discussed how to technically integrate this, and lean towards writing one or more R scripts that do long read denoising. The idea would be to run this as part of the workflow, after primer removal with cutadapt, when users specify PacBio or similar, instead of the normal QIIME2 processing. At the end of the script, a QIIME2 artefact would be output and the rest of the workflow could continue. Probably, this would be just before taxonomy assignment.
(Primer removal might be somewhat different, since the PacBio apparently contain sequences upstream of the primer (both forward and reverse) that should also be deleted. A flag to cutadapt, I suppose.)
We will start working on the R script(s) after Christmas and after we agree on a plan. We're all happy to discuss!
(Adding @DiegoBrambilla too.)
Probably dada2:::removePrimers
is better suited than cutadapt. This funcation can also immediately re-orient PacBio reads.
e.g. dada2:::removePrimers(rawFILES[i], trimmedFILES[i], primer.fwd=primerFW, primer.rev=dada2:::rc(primerRV), orient=TRUE)
I'm currently working on a workflow for analyse ONT full-lenght 16S sequences using QIIMe2, based on this project [https://github.com/DeniRibicic/q2ONT]. Are you considering adding Nanopore data to the processing path ?
Hi @nbargues , this looks interesting. ~Yes, Nanopore processing should be included one day. As far as I can see you are using vsearch but dada2 is planned for this pipeline.~ Sorry, I am wrong, that's not planned at the moment.
@d4straub Thanks for the response. I read that nanopore data is not supported by DADA2 currently and neither is the other denoising method Deblur. That's why vsearch is used.
@nbargues Oh! I am so sorry, I somehow mixed PacBio and Nanopore!! You are right, DADA2 only supports Illumina and PacBio but not Nanopore. And I have to correct myself, Nanopore is currently not meant to be included. I'll edit the comment above so that nobody else is deceived!
Probably
dada2:::removePrimers
is better suited than cutadapt. This funcation can also immediately re-orient PacBio reads. e.g.dada2:::removePrimers(rawFILES[i], trimmedFILES[i], primer.fwd=primerFW, primer.rev=dada2:::rc(primerRV), orient=TRUE)
In our experience, cutadapt does a better job recognising the primers, but maybe that's not what you have seen? Anyway, using the (relatively new) option --rc cutadapt too can re-orient the reads during the primer removal process.
Thanks for this remark. I did not specifically compare performance of dada2:::removePrimers
with cutadapt
. I have also have not used yet cutadapt's -rc, seems like a valid solution as well.
What was maybe not clear in the discussions on the Stockholm hackathon was that we also would like to add support for ITS (which is the current use case we have for PacBio data). Should a separate issue be opened for that?
I have close to no experience with analyzing ITS sequences. Would a processing path be very different from 16S to ITS analysis with DADA2 (except the taxonomic database, obviously)? If yes, than it would be definitely worth it to open a separate issue.
This was solved in #168, thanks @jtangrot!
We had discussions with Anders @andand, Daniel @erikrikarddaniel and Jeanette @jtangrot adding this on the Stockholm hackathon.
It would be an additional path to the existing ampliseq workflow for a future release to add possibilities that are currently missing.