hsgweon / pipits

Automated pipeline for analyses of fungal ITS from the Illumina
GNU General Public License v3.0
30 stars 16 forks source link

pispino_seqprep #58

Open favolaschia opened 7 months ago

favolaschia commented 7 months ago

Hi,

When I run the pispino_seqprep -i rawdata -o prepped -l reads.txt on both my new and older data (that had previously been run just fine), I get the following:

2024-02-05 11:31:30 pispino_seqprep started 2024-02-05 11:31:30 Checking listfile 2024-02-05 11:31:30 ... done 2024-02-05 11:31:30 Counting sequences in rawdata 2024-02-05 11:31:31 ... number of reads: 158497 2024-02-05 11:31:31 Reindexing forward reads 2024-02-05 11:31:33 ... done 2024-02-05 11:31:33 Reindexing reverse reads 2024-02-05 11:31:35 ... done 2024-02-05 11:31:35 Joining paired-end reads [VSEARCH] 2024-02-05 11:31:35 Error: None zero returncode: vsearch --fastq_mergepairs prepped/tmp/reindex_fastq_F/SRR26996752.fastq --reverse prepped/tmp/reindex_fastq_R/SRR26996752.fastq --fastqout prepped/tmp/joined/SRR26996752.fastq --threads 1 --fastq_allowmergestagger --fastq_maxdiffs 500 --fastq_minovlen 20 --fastq_minmergelen 100

When I tried to run vsearch by itself using the same command:

vsearch --fastq_mergepairs prepped/tmp/reindex_fastq_F/SRR26996752.fastq --reverse prepped/tmp/reindex_fastq_R/SRR26996752.fastq --fastqout prepped/tmp/joined/SRR26996752.fastq --threads 1 --fastq_allowmergestagger --fastq_maxdiffs 500 --fastq_minovlen 20 --fastq_minmergelen 100 vsearch v2.27.0_macos_x86_64, 16.0GB RAM, 4 cores https://github.com/torognes/vsearch

Merging reads 0%

Fatal error: Invalid line 3 in FASTQ file: '+' line must be empty or identical to header

Looking at the fastq files they have the following issue where the identifier lines don't match up:

@SRR26996752_1 GTGAATCATCGAATCTTTGAACGCACATTGCGCCCTCTGGTATTCCAGGGGGCATGCCTGTTTGAGCGTCATTTCCTTCTCAAACCCTCGGGTTTGGTAGTGAGTGGTACTCTTTCTGGGTTAACTTGAAAATGCTGGCCATCTGGCTGTTGCTGACTGAGGTTTTAGTCCAGTCCGCTGATACTCTGCGTATTAGGTTTTACCAACTCGTAGTGGCGTTAGTAGGCGTTTTAAAGGCTTTTACTGAAAGTACAGACAGTCTGGCAAACAGTATTCATAAAGTTTGACCTCAAATCAGGTT +SRR26996752.1 M05298:130:000000000-JV7YV:1:1101:19851:1275 length=301 CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGEEGGGGGGGGGGGGGGGGGGGGDGGGGG=EGGGGGGGGEGGGFFFFGBGGGGC<FFFFFAA6CEEF446FFGFF;F?GF)3CECEA5):@ADCAA)6<>BF@BFFFF3<)

This happens both on new data downloaded from BioProject for a student project and on some in house generated data that had been run previously without issue. This has happened to me both with a new install on a MacBook (Mac OS) and a Debian install.

Thanks for your help.