Open kdbchau opened 1 month ago
R1 and R2 reads have already been merged into each sample file).
You say "merged", but also mention "interleaved", however, these are two different things, so I’m not sure which one you mean. Merging means to me that the two reads from a read pair are merged into a single-end read like this:
before:
R1 ---------------->
R2 <--------------------
after:
---------------------------->
On the other hand, "interleaved" just refers to a way to store paired-end reads in a single file. So while you usually have two files, one with the R1 reads and one with the R2 reads, storing paired-end reads in an interleaved manner means that you only have one file where you store the two reads one after the other so that the order of reads in the file is R1, R2, R1, R2 and so on.
Judging from the error message, I would guess that you have merged data, so the --interleaved
option doesn’t apply, neither do the uppercase options -A
and -G
.
If it is merged data, you may want to trim your data with a linked adapter (see documentation) using -a ^forwardprimer...revcomp-of-reverse-primer
(i.e., -a ^CTTGGTCATTTAGAGGAAGTAA...GCTGCGTTCTTCATCGATGC$
).
I would also add option --discard-untrimmed
so that all reads get discarded that don’t look as expected. The ^
at the beginning of the forward primer make the adapter anchored, which means that it must start at the beginning of the read.
If this does not work, please remove the ^
and $
from the command, re-run it, and paste the report ("length of removed sequences") here.
By the way, your Cutadapt version is quite old (2.8); I recommend using a more recent one.
Ok I ran it with cutadapt version 4.9 and I do finally see some output. This is my code and output:. I did have to remove the ^ and $ symbols for it to work, otherwise all my files were empty.
Code:
for files in *.fastq; do /home/chauk/.local/bin/cutadapt -a CTTGGTCATTTAGAGGAAGTAA...GCTGCGTTCTTCATCGATGC -n 2 -o output2/${files%%.fastq}_cut.fastq $files; done
Output:
Overview of removed sequences at 3' end
length count expect max.err error counts
5 7 46.9 0 7
Not sure if this makes sense, seems like only 7 sequences were removed although previously with the R tutorial it showed that at least 70 reads had a forward primer to it and 68 with reverse.
Ok I also did the sanity check but the reverse primers are not removed.
> # Sanity check, count the presence of primers in the first cutadapt-ed sample
> rbind(FWD.ForwardReads = sapply(FWD.orients, primerHits, fn = myfiles.cut[[1]]),
+ FWD.ReverseReads = sapply(FWD.orients, primerHits, fn = myfiles.cut[[1]]),
+ REV.ForwardReads = sapply(REV.orients, primerHits, fn = myfiles.cut[[1]]),
+ REV.ReverseReads = sapply(REV.orients, primerHits, fn = myfiles.cut[[1]]))
Forward Complement Reverse RevComp
FWD.ForwardReads 0 0 0 0
FWD.ReverseReads 0 0 0 0
REV.ForwardReads 0 0 0 68
REV.ReverseReads 0 0 0 68
So I will try this again because I think my reverse primer needs to be the reverse complement instead (since it's matching to RevComp). But At least the forward primers are removed!
Minor update, yes changing the REV primer to its REV.Comp worked and now my sanity check output is:
Forward Complement Reverse RevComp
FWD.ForwardReads 0 0 0 0
FWD.ReverseReads 0 0 0 0
REV.ForwardReads 0 0 0 0
REV.ReverseReads 0 0 0 0
Hello. I'm having a problem trying to trim my primers from my interleaved fastq files (R1 and R2 reads have already been merged into each sample file).
I am using Windows Linux Bash Systems, cutadapt (version 2.8) was installed with
sudo
; no issues there. I am running Python3 version 3.8.10.Example of a file name is:
CONTROL_H12_Rep1_ITS1-F_M13F-ITS2_M13R.fastq
. The ITS FWD and REV reads are together here.The primers I use:
Because I was following along with the DADA2 ITS pipeline in R, I had an output for my fastq files that the REV primer was detected as the complement of itself. Example of the output I got:
As such, I changed REV to be the reverse complement, so:
And then I also have the reverse complements for the primers:
And here is the code I used in terminal (because the cutadapt command within the dada2 R script wouldn't work in R for me):
But I get an error message that the reads are improperly paired:
cutadapt: error: Error in sequence file at unknown line: Reads are improperly paired. Name 'VH01754:2:AAFFFT2M5:1:1101:35538:7210 1:N:0:1' (first) does not match 'VH01754:2:AAFFFT2M5:1:1101:50365:12113 1:N:0:1' (second).
But I'm not sure how to fix this, because when I look at the names for the reads they appear only once and all end with "1:N:0:1" so it's almost like it had been converted into a single-end read file, but the sequencing was done as paired end. It's very confusing and I was wondering if you know how this may be remedied to ensure both primers are removed? I have tried treating this as single end but it is unclear if it is removing both the FWD and REV primer complements.