halelab / GBS-SNP-CROP

GBS SNP Calling Reference Optional Pipeline
GNU General Public License v2.0
31 stars 31 forks source link

2 Error, too large read? Allocate more mem...Illegal division by zero at GBS-SNP-CROP-4.pl line 218. #23

Closed angelaparodymerino closed 4 years ago

angelaparodymerino commented 5 years ago

Hi,

I am trying to move forward from the step I got stuck - which is STEP 4 - and I am using the tutorial dataset. This is the error message that I could not (yet) find the solution for:

2 Error, too large read? Allocate more mem...Illegal division by zero at GBS-SNP-CROP-4.pl line 218.

Details about what I did:

----> I run GBS-SNP-CROP-4.pl not within "demultiplexed" folder but in the folder v.4.0:

~/GBS-SNP-CROP/GBS-SNP-CROP-scripts/v.4.0$ ls barcodesID.txt GBS-SNP-CROP-4.pl Lib1_01.R2.fq.gz Lib1_05.R2.fq.gz demultiplexed GBS-SNP-CROP-5.pl Lib1_02.R1.fq.gz OutputsStep3 distribs GBS-SNP-CROP-6.pl Lib1_02.R2.fq.gz parsed FastaCQonLibraries GBS-SNP-CROP-7.pl Lib1_03.R1.fq.gz singles FastaForRef GBS-SNP-CROP-8.pl Lib1_03.R2.fq.gz summaries GBS-SNP-CROP-1.pl GBS-SNP-CROP-9.pl Lib1_04.R1.fq.gz GBS-SNP-CROP-2.pl InitialFastaqFiles Lib1_04.R2.fq.gz GBS-SNP-CROP-3.pl Lib1_01.R1.fq.gz Lib1_05.R1.fq.gz

----> This is the script I run (from tutorial): ~/GBS-SNP-CROP/GBS-SNP-CROP-scripts/v.4.0$ perl GBS-SNP-CROP-4.pl -pr /usr/local/bin/pear -vs /usr/local/bin/vsearch -d PE -b barcodesID.txt -t 10 -cl consout -rl 150 -pl 32 -p 0.01 -id 0.93 -min 32 -MR MR

----> When I run that script I obtain the following:

#################################

GBS-SNP-CROP, Step 4, v.4.0

################################# Parsing Paired-End reads...

2 Error, too large read? Allocate more mem...Illegal division by zero at GBS-SNP-CROP-4.pl line 218.

---->Output files are created but they are empty (see below in bold) and it has just run for a second:

~/GBS-SNP-CROP/GBS-SNP-CROP-scripts/v.4.0$ ls barcodesID.txt FastaForRef GBS-SNP-CROP-4.pl GBS-SNP-CROP-8.pl Lib1_01.R2.fq.gz Lib1_03.R2.fq.gz Lib1_04.stitched.fasta Lib1_05.R2.fq.gz singles demultiplexed GBS-SNP-CROP-1.pl GBS-SNP-CROP-5.pl GBS-SNP-CROP-9.pl Lib1_02.R1.fq.gz Lib1_04.assembled.fasta Lib1_04.unassembled.forward.fastq OutputsStep3 summaries distribs GBS-SNP-CROP-2.pl GBS-SNP-CROP-6.pl InitialFastaqFiles Lib1_02.R2.fq.gz Lib1_04.R1.fq.gz Lib1_04.unassembled.reverse.fastq parsed FastaCQonLibraries GBS-SNP-CROP-3.pl GBS-SNP-CROP-7.pl Lib1_01.R1.fq.gz Lib1_03.R1.fq.gz Lib1_04.R2.fq.gz Lib1_05.R1.fq.gz Pear.log

----> Pear.log file looks like this (my comment: it looks like it stops at "computing empirical frequencies", therefore could it be a memory issue?):

PEAR (Zhang et al., 2014) summary results:

Analyzing paired Lib1_04.R1.fq.gz and Lib1_04.R2.fq.gz reads...

| | | / \ | | |) | | / _ \ | |) | | /| | / _ | _ < || |___// _| \

PEAR v0.9.2 [March 26 2014]

Citation - PEAR: a fast and accurate Illumina Paired-End reAd mergeR Zhang et al (2014) Bioinformatics 30(5): 614-620 | doi:10.1093/bioinformatics/btt593

Forward reads file.................: Lib1_04.R1.fq.gz Reverse reads file.................: Lib1_04.R2.fq.gz PHRED..............................: 33 Using empirical frequencies........: YES Statistical method.................: OES Maximum assembly length............: 999999 Minimum assembly length............: 32 p-value............................: 0.010000 Quality score threshold (trimming).: 0 Minimum read size after trimming...: 1 Maximal ratio of uncalled bases....: 1.000000 Minimum overlap....................: 10 Scoring method.....................: Scaled score Threads............................: 10

Allocating memory..................: 200,000,000 bytes Computing empirical frequencies....:

Manually stitching together unassembled reads results:


My though: Could it be a problem of memory on my computer? In that case, hDo you have any suggestion to make it work? If it is a memory issue: What else could it be and any idea of how could it be fixed?

A question not related to the issue is: Why barcodeID.txt file is giving permission to only use Lib1_04 for the cluster reads and assembly of the mock genome instead of using all of them? I would think that a mock genome generated with more reads (as long as they are "good" reads) would be better than a mock genome generated with less reads.

Thanks in advance,

Regards,

'Angela Parody Merino

halelab commented 5 years ago

Hi Angela, we've had some personnel changes in the lab recently, with Arthur moving onto a job elsewhere. I apologize that this thread got dropped...may I ask what the status is currently with this issue? Please let me know and I will do my best to help.

My short answer is this: We built Script 4 to work with compressed FASTQ files, but I am running into inconsistencies with versions of PEAR being able to work with such files. Please run "man pear" on your command line and see what the documentation tells you about the kind of input files PEAR can handle, using the version you have installed. If it cannot handle compressed (gz) files, you will indeed need to unzip them and then modify the line of code in Script 4 that Arthur previously directed you to.

Let me know, iago

angelaparodymerino commented 5 years ago

Thanks for your reply.

I have been able to fix the problem following what you said: I have checked that the PEAR version I am using (v0.9.2) needs uncompressed files and I modified the lines 131 and 132 in the Script 4 accordingly. Now STEP 4 - using the tutorial data - worked well and generated the expected output files :). I am having a great day.

Thanks again for the help,

'Angela Parody Merino

halelab commented 4 years ago

Please see newly released v.4.1 with updated User Manual, and thanks for flagging the bugs. PEAR actually handles compressed files just fine. Other small issues in the code were behind the issue. iago