transcript / samsa2

SAMSA pipeline, version 2.0. An open-source metatranscriptomics pipeline for analyzing microbiome data, built around DIAMOND and customizable reference databases.
GNU General Public License v3.0
53 stars 36 forks source link

Combining the concatenated (unassembled) forward and reverse reads with the assembled reads #51

Closed memoll closed 3 years ago

memoll commented 3 years ago

Hi Sam,

I've done a paired-end sequencing and managed to merge ~ 70% of the forward and reverse sequences using pear. I have then used the combining_unmerged.R on the unassembled forward and reverse sequences to concatenate them. I was wondering if you combine the results of concatenated forward and reverse reads at any step with the rest of the assembled reads, and if yes in which script I can find that information. Thanks in advance!

transcript commented 3 years ago

Hi Mona,

In the standard workflow setup, only the concatenated reads move forward past Step 2 in the master_script.sh script. This means that any forward or reverse reads that are not merged into a single read by PEAR are discarded, and aren't used at any later point.

If you're concerned about not having enough merged reads from PEAR's output, you can add the unmerged forward reads to this file (unix cat is an easy way to do so), and use this combined merged+unmerged_forward set moving forward in Step 3 and onward.

In general, I see 45-65% of the initial reads (unmerged) that go into the pipeline get merged by PEAR. If this lowers your read count too much, it's worth considering adding the unmerged forward reads, although keep in mind that this may reduce the accuracy of annotations.

-Sam