Open cathreenj opened 5 years ago
On a brief glance, I would be a bit concerned about the small number of ASVs ("OTUs") that you are getting at the end of this. Can you tell us what amplicon region you are sequencing? What is your primer set? I suspect that you have cut your reads too short (truncLen
too small) and are losing almost all your reads during merging because they no longer overlap.
Also, the figures didn't seem to post in your message, could you possibly upload in particular the quality profiles?
Hi benjjneb, thanks a lot for taking the time. First things first: I sequenced the mcra gene, coding for the Alpha subunit of the Methyl Com Reductase. The Primers were 1046f and 1435r. I used a corresponding reference database. Regarding the truncLen, as far as I know I didn't use the truncLen parameter... I know from the raw sequence summary statistics, that a substantial proportion of my reads only had a length of around 20 bp... so I guess, that there was a problem with the sequencing of my samples. I added my quality plots in this comment.
Thank you again
Cathrine
I sequenced the mcra gene, coding for the Alpha subunit of the Methyl Com Reductase. The Primers were 1046f and 1435r. I used a corresponding reference database.
So the amplicons will be ~400 nts in length? Do you have a prior expectation of how tight the length distro is for mcrA (I'm not familiar with these locus).
Regarding the truncLen, as far as I know I didn't use the truncLen parameter...
I see. That is probably OK, although if the amplicon is consistently shorter than 410 nts I would probably remove a bit of the end of the reverse reads given the quality drop there, something like truncLen=c(0, 200)
.
I know from the raw sequence summary statistics, that a substantial proportion of my reads only had a length of around 20 bp... so I guess, that there was a problem with the sequencing of my samples.
That could explain some of the loss of reads. You could also filter those out with minLen=c(50, 50)
e.g. in the filtering step. The key question is how many reads you've kept at the end of the workflow. What is the output of sample_sums(ps)
?
Hi Benjjneb, sorry for the delay... Yes the amplicons have ~400bps, I couldn't find any info on the expected length distribution. We don't have a bioanalyzer here, so I couldn't check myself. I tried to truncate the last few bases on the reverse reads, but it didn't really make a difference. Many of my reads are very short...
out_tr <- filterAndTrim(fnFs_tr, filtFs.tr, fnRs_tr, filtRs.tr,
- minLen = c(50,50), truncLen = c(0, 200), maxN=0, maxEE=c(2,5), minQ=2,
- compress=FALSE, verbose=T, multithread=TRUE) # On Windows set multithread=FALSE print(out_tr) reads.in reads.out M1-T1_R1_trimmed.fastq 13691 4482 M2-T1_R1_trimmed.fastq 19217 3503 M4-T2_R1_trimmed.fastq 18028 1722 M5-T2_R1_trimmed.fastq 16042 1014 M6-T2_R1_trimmed.fastq 17724 3968 M7-R28_R1_trimmed.fastq 18974 1662 M9-R30_R1_trimmed.fastq 9444 3807
so the output of sample_sums is:
sample_sums(ps_tr) M1-T1 M2-T1 M4-T2 M5-T2 M6-T2 4134 3322 1700 1005 3838 Thank you for your time, Cathrine
Many of my reads are very short...
If that's the case, then it might just be unavoidable that you are losing most of your reads in the filtering step, just not a great sequencing run.
It does seem that after filtering, >90% of the reads are making it through to the end of the pipeline, which is a good sign. So, given the apparent issues with data quality, you are probably OK to use the denoised data, and just have to accept that the library sizes aren't that big after quality control.
Ok! That's good news! Thank you very much fpr your help!
Hi, I reanalysed my Illumina miseq data and found some discrepancies between the old and my new results. I will post my code and the R output here in the hope of receiving some feedback on the accuracy of my coding... I would very much appreciate it if someone could tell me if what I did is correct. Thank you very much.