transcript / samsa2

SAMSA pipeline, version 2.0. An open-source metatranscriptomics pipeline for analyzing microbiome data, built around DIAMOND and customizable reference databases.
GNU General Public License v3.0
53 stars 36 forks source link

How much difference is expected between master_script and preserving_unmerged.sh #43

Closed Shicheng-Guo closed 4 years ago

Shicheng-Guo commented 4 years ago

Dear Sam,

How much difference is expected between master_script_preserving_unmerged.sh and master_script.sh to a same dataset?

Thanks.

Shicheng

transcript commented 4 years ago

Hi Shicheng,

This will vary depending on your sequencing approach and the amount of overlap when merging paired-end reads.

In general, I've found that about 15-40% of reads may be dropped in the mate-pair merging step. The impact this will have on results varies based on the depth of sequencing, but if you are concerned that your sequencing may be too shallow to fully encompass all the diversity of your samples, you can leave in the unmerged reads to partially remedy this by increasing the number that are annotated.

Best, Sam