hariszaf / pema

PEMA: a flexible Pipeline for Environmental DNA Metabarcoding Analysis of the 16S/18S rRNA, ITS and COI marker genes
27 stars 12 forks source link

Enhancement: Add bbmap suite for read preprocessing #39

Open natgiot opened 2 years ago

natgiot commented 2 years ago

BBmap is available here: https://sourceforge.net/projects/bbmap/ it has been published in PloS One: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5657622/ and adopted by a wide community, including the JGI (here is a guide in their website for bbmerge: https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbmerge-guide/)

Depending on how the suite of tools may be integrated in Pema, merging can be achieved through bbmerge, but additional steps (trimming, adapter removal) may also be handled with the same package in a very fast and efficient way.

Thank you for considering adding the tool, it appears to handle better than pandaseq the merging step of fully overlapping reads (insert size equal to read length cases)!

hariszaf commented 1 year ago

This issue is to replace the merging function with a novel one that would invoke the BBmap tool.

As a first step, a bbmap function needs to be added in the preprocess.bds script.

Then an if statement should be added in pema_latest.bds and a parameter in the parameter files asking for the user which merging approach to use.