Closed mikemc closed 4 years ago
Hi @mikemc, Thanks for your interest in mOTUs.
So, you have paired-end reads (forward and reverse). Some reads are of low quality and would be better to either remove them or trim the bases of low quality.
You can use trimmomatic to filter and trim reads. Note that if you remove one read from the forward file, then the corresponding read in the reverse file doesn't have a pair. You have two options now:
single
.
- keep the read, but put it in a new file (which contains reads without pairs), and we call this file
single
.
I see, thanks for the clarification! Does the mapping done by motus profile
use the paired-end information, or would the results be the same if the forward and reverse reads were filtered independently (losing the paired correspondence and keeping reads where the mate was discarded) and then passed to mOTUs with
motus profile -s forward_reads_filtered.fq,reverse_reads_filtered.fq
mOTUs profile use paired end information when mapping the reads.
It helps to assign reads to map to multiple species. For example, if for_read map equally well to species_1 and species_2; and rev_read map equally well to species_2 and species_3. Then we know that the reads comes from species_2.
Got it, thanks.
I'm interested in whether you think trimming and filtering is necessary (beyond removing any adapter sequences), and if so what parameters (e.g. for trimmomatic) you use in your own analysis or in the benchmarking datasets from the mOTUs2 paper. The BWA creator has suggested that quality trimming is not necessary for BWA-MEM due to the soft-clipping behavior, but the BWA FAQ indicates error rates above 2% on 100bp reads might not work well, so it is unclear to me whether spurious alignments are likely enough to be a problem for the resulting mOTUs2 profiles. Have you seen benefits of trimming and filtering beyond computational speed-ups from having to align less reads? I.e., more false-positives in the mOTUs that are identified?
The BWA creator has suggested that quality trimming is not necessary for BWA-MEM due to the soft-clipping behavior, but the BWA FAQ indicates error rates above 2% on 100bp reads might not work well, so it is unclear to me whether spurious alignments are likely enough to be a problem for the resulting mOTUs2 profiles. Have you seen benefits of trimming and filtering beyond computational speed-ups from having to align less reads? I.e., more false-positives in the mOTUs that are identified?
It doesn't make much of a difference to trim reads. For example, here I evaluated with ~100 samples with mOTUs V2. Changing -g has a bigger effect than trimming.
and if so what parameters (e.g. for trimmomatic) you use in your own analysis or in the benchmarking datasets from the mOTUs2 paper.
I used the following parameters for trimmomatic:
LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
which are also used in their website.
Great, thanks again for the info. That answers my questions so I'll go ahead and close the issue.
The paper and wiki say that reads should be appropriately quality controlled. Do you recommend a particular filter-and-trim strategy for paired-end Illumina reads that works well with mOTUs2? Also, the wiki gives two different commands for profiling with paired-end reads:
Can you clarify what is happening in the second case (what is in
single_reads.fq
and how is it being used with the forward and reverse reads) and which method you typically suggest? Thanks!