Closed FlintMitchell closed 1 year ago
I think it refuses to merge them due to the non-matching tail of T's (ATCTTTTCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
) that would need to be clipped. The ends of the sequences must match for the sequences to be merged.
I have some paired-end reads that overlap, but I am unable to merge any of the reads. For example, taking one read from these files:
test_R1.fastq:
test_R2.fastq:
These two reads have perfectly matching 23 bp reverse-complement sequences (bolded):
TCTGCTGCTCCCCGGGTGTGGCTCCTTCATCTGACAACGTGCAACCCCTATCGCGATGGCAAAGGAAAGGAAGCCCTGCTTCCTCCAGATTTCGTTATAGGACAGCGGGATCTTTTCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
CAGGTCCATCGATTGTTTCTGCGGACGGTGTTGTCCTCATAGTTTGGGCATGTTTCGCTTCCAGCCCAGCCAAACTTGTCAACCAGTATCCCGGTGCAGGAGCTGCACATACTAGCCCCTGTCTAGGACCCGCTGTCCTATAACGAAATCT
where the overlaps match like so: ...AAGCCCTGCTTCCTCCAGATTTCGTTATAGGACAGCGGGATCTTTTCT... ___TCTAAAGCAATATCCTGTCGCCCAGGATCTGTCCCCGATC... When using:
vsearch --fastq_mergepairs test_R1.fastq --reverse test_R2.fastq --fastqout testmerge.fastq
I get
Or when combining some flags that were used in other examples to lessen the strictness of the tool:
vsearch --fastq_mergepairs test_R1.fastq --reverse test_R2.fastq --fastqout testmerge.fastq --fastq_allowmergestagger --fastq_maxdiffs 30 --fastq_minovlen 5 --fastq_qmin 0
results in the same thing, not merging the two files.
When I do vsearch using the above command with the fastq files that contain all of the reads, none of them merge together (out of 50k+, of which 40k+ of them I used
grep
to confirm that the overlapping sequence is present in them!)Any help would be greatly appreciated.