Recommended usage with paired-end reads

motu-tool / mOTUs

motus - a tool for marker gene-based OTU (mOTU) profiling

GNU General Public License v3.0

145 stars 24 forks source link

Recommended usage with paired-end reads #42

Closed mikemc closed 4 years ago

mikemc commented 4 years ago

The paper and wiki say that reads should be appropriately quality controlled. Do you recommend a particular filter-and-trim strategy for paired-end Illumina reads that works well with mOTUs2? Also, the wiki gives two different commands for profiling with paired-end reads:

As fastq input is possible to provide paired end reads:
motus profile -f forward_reads.fastq -r reverse_reads.fastq
as well as single reads that comes from quality filtering:
motus profile -f forward_reads.fq -r reverse_reads.fq -s single_reads.fq

Can you clarify what is happening in the second case (what is in single_reads.fq and how is it being used with the forward and reverse reads) and which method you typically suggest? Thanks!

AlessioMilanese commented 4 years ago

Hi @mikemc, Thanks for your interest in mOTUs.

So, you have paired-end reads (forward and reverse). Some reads are of low quality and would be better to either remove them or trim the bases of low quality.

You can use trimmomatic to filter and trim reads. Note that if you remove one read from the forward file, then the corresponding read in the reverse file doesn't have a pair. You have two options now:

throw away also the read in the reverse file, or
keep the read, but put it in a new file (which contains reads without pairs), and we call this file single.

mikemc commented 4 years ago

keep the read, but put it in a new file (which contains reads without pairs), and we call this file single.

I see, thanks for the clarification! Does the mapping done by motus profile use the paired-end information, or would the results be the same if the forward and reverse reads were filtered independently (losing the paired correspondence and keeping reads where the mate was discarded) and then passed to mOTUs with

motus profile -s forward_reads_filtered.fq,reverse_reads_filtered.fq

AlessioMilanese commented 4 years ago

mOTUs profile use paired end information when mapping the reads.

It helps to assign reads to map to multiple species. For example, if for_read map equally well to species_1 and species_2; and rev_read map equally well to species_2 and species_3. Then we know that the reads comes from species_2.

mikemc commented 4 years ago

Got it, thanks.

I'm interested in whether you think trimming and filtering is necessary (beyond removing any adapter sequences), and if so what parameters (e.g. for trimmomatic) you use in your own analysis or in the benchmarking datasets from the mOTUs2 paper. The BWA creator has suggested that quality trimming is not necessary for BWA-MEM due to the soft-clipping behavior, but the BWA FAQ indicates error rates above 2% on 100bp reads might not work well, so it is unclear to me whether spurious alignments are likely enough to be a problem for the resulting mOTUs2 profiles. Have you seen benefits of trimming and filtering beyond computational speed-ups from having to align less reads? I.e., more false-positives in the mOTUs that are identified?

AlessioMilanese commented 4 years ago

The BWA creator has suggested that quality trimming is not necessary for BWA-MEM due to the soft-clipping behavior, but the BWA FAQ indicates error rates above 2% on 100bp reads might not work well, so it is unclear to me whether spurious alignments are likely enough to be a problem for the resulting mOTUs2 profiles. Have you seen benefits of trimming and filtering beyond computational speed-ups from having to align less reads? I.e., more false-positives in the mOTUs that are identified?

It doesn't make much of a difference to trim reads. For example, here I evaluated with ~100 samples with mOTUs V2. Changing -g has a bigger effect than trimming.

AlessioMilanese commented 4 years ago

and if so what parameters (e.g. for trimmomatic) you use in your own analysis or in the benchmarking datasets from the mOTUs2 paper.

I used the following parameters for trimmomatic:

LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

which are also used in their website.

mikemc commented 4 years ago

Great, thanks again for the info. That answers my questions so I'll go ahead and close the issue.