halelab / GBS-SNP-CROP

GBS SNP Calling Reference Optional Pipeline
GNU General Public License v2.0
31 stars 31 forks source link

Failed to open files.. Illegal division by zero at GBS-SNP-CROP-4.pl line 218 #21

Closed angelaparodymerino closed 4 years ago

angelaparodymerino commented 5 years ago

Hi,

Briefly, I am biologist and I am using GBS-SNP-CROP for the first time. I am following the tutorial and using the tutorial material. I managed to do steps 1 to 3 and got stuck in step 4. The barcodesID.txt file that I have downloaded from GitHub for the tutorial looks like this:

TGACGCCA Lib1_01 NO CAGATA Lib1_02 NO GAAGTG Lib1_03 NO TAGCGGAT Lib1_04 YES TATTCGCAT Lib1_05 NO

Which means that for step 4 only library Lib1_04 will be considered. I am not sure why the other libraries are not taken into account, they look fine. I have checked the number of reads are equal between R1 and R1 per genotype, so we can discard that this is the cause of the problem.

The error message I am getting is the following:

$ perl GBS-SNP-CROP-4.pl -pr /usr/local/bin/pear -vs /usr/local/bin/vsearch -d PE -b barcodesID.txt -t 10 -cl consout -rl 150 -pl 32 -p 0.01 -id 0.93 -min 32 -MR MR

#################################

GBS-SNP-CROP, Step 4, v.4.0

################################# Parsing Paired-End reads...

Failed to open files.. Illegal division by zero at GBS-SNP-CROP-4.pl line 218.

It produces the output files but all of them are empty. The PEAR.log looks like this:

$ cat Pear.log

PEAR (Zhang et al., 2014) summary results:

Analyzing paired Lib1_04.R1.fq.gz and Lib1_04.R2.fq.gz reads...


| | ____| / \ | \ | |) | | / \ | |) | | /| |__ / | < || |____// __| _\

PEAR v0.9.2 [March 26 2014]

Citation - PEAR: a fast and accurate Illumina Paired-End reAd mergeR Zhang et al (2014) Bioinformatics 30(5): 614-620 | doi:10.1093/bioinformatics/btt593

Forward reads file.................: Lib1_04.R1.fq.gz Reverse reads file.................: Lib1_04.R2.fq.gz PHRED..............................: 33 Using empirical frequencies........: YES Statistical method.................: OES Maximum assembly length............: 999999 Minimum assembly length............: 32 p-value............................: 0.010000 Quality score threshold (trimming).: 0 Minimum read size after trimming...: 1 Maximal ratio of uncalled bases....: 1.000000 Minimum overlap....................: 10 Scoring method.....................: Scaled score Threads............................: 10

Manually stitching together unassembled reads results:

I have read that this might be because of not having the same number of reads in R1 and R2 but as I said this is not my case. Another think that comes to my mind is the version of PEAR. The versions I got is v.0.9.2 so I am suspecting that the problem might be that I am not using the latest version of PEAR. However the latest version is not free! Does anyone knows if I am right saying that GBS-SNP-CROP (v.4.0) would not work with PEAR v0.9.2?

I have to say that I unzipped (gunzip) the intput files because I was also having another error message: "Too large read? Allocate more mem..." and I had read in a forum that if the version of PEAR is v0.9.2 input files should be unzipped (right?).

This is how Pear.log file looks when using qzipped files as input:

$ cat Pear.log

PEAR (Zhang et al., 2014) summary results:

Analyzing paired Lib1_04.R1.fq.gz and Lib1_04.R2.fq.gz reads...


| | ____| / \ | \ | |) | | / \ | |) | | /| |__ / | < || |____// __| _\

PEAR v0.9.2 [March 26 2014]

Citation - PEAR: a fast and accurate Illumina Paired-End reAd mergeR Zhang et al (2014) Bioinformatics 30(5): 614-620 | doi:10.1093/bioinformatics/btt593

Forward reads file.................: Lib1_04.R1.fq.gz Reverse reads file.................: Lib1_04.R2.fq.gz PHRED..............................: 33 Using empirical frequencies........: YES Statistical method.................: OES Maximum assembly length............: 999999 Minimum assembly length............: 32 p-value............................: 0.010000 Quality score threshold (trimming).: 0 Minimum read size after trimming...: 1 Maximal ratio of uncalled bases....: 1.000000 Minimum overlap....................: 10 Scoring method.....................: Scaled score Threads............................: 10

Allocating memory..................: 200,000,000 bytes Computing empirical frequencies....:

Manually stitching together unassembled reads results:

In short, what should I do to solve my problem and be able to finish step 4 using the tutorial material?

Let me know if you need more details.

Thanks in advance,

'Angela Parody Merino

angelaparodymerino commented 5 years ago

Ah! I forgot to mention that line 218 refers to: my $unstitched_percentage = ( $unstitched_tally / ( $stitched_tally + $unstitched_tally ) ) * 100;

I have read in other forums about the same error although not in the same line but the line shows the same formula. Anyway, I could not find out how to fix the problem.

Thanks in advance again!

'Angela Parody Merino

arthurmelobio commented 5 years ago

Dear Angela,

Thank you for using GBS-SNP-CROP in your studies.

  1. The third column (YES/NO column) of the barcode ID file is just to determine which genotype will be considered for mock reference assembly. Then, all genotypes listed will be aligned and genotyped. Please, take a look at User Manual for more detail.

  2. Regarding your errors, the main one is "Failed to open files.." which makes me think if you're running GBS-SNP-CROP-4.pl within the "demultiplexed" folder. Please, make sure! The reason of line 218 error is the script can not open the files Lib1_04.R1.fq.gz and Lib1_04.R2.fq.gz.

  3. The PEAR version we used to develop GBS-SNP-CROP is v.0.9.6 and you're using v.0.9.2. Honestly, I don't think this is the issue. However, before installing the v.0.9.6 PEAR version, check if you're running the step 4 within the "demultiplexed" folder or if all Lib1_*.gz files are within the same directory you're running the script.

  4. "I have to say that I unzipped (gunzip) the intput files..." Ops ... I think this is causing the error. If your files are not gziped, please, remove the comma and gz (", "gz") from lines 131 and 132.

Please, let me know if you're able to move forward.

Best, Arthur

angelaparodymerino commented 5 years ago

Hi Arthur,

Thanks for your quick reply.

  1. Ok

  2. Actually I run GBS-SNP-CROP-4.pl not within "demultiplexed" folder but in the folder v.4.0:

~/GBS-SNP-CROP/GBS-SNP-CROP-scripts/v.4.0$ ls barcodesID.txt GBS-SNP-CROP-4.pl Lib1_01.R2.fq.gz Lib1_05.R2.fq.gz demultiplexed GBS-SNP-CROP-5.pl Lib1_02.R1.fq.gz OutputsStep3 distribs GBS-SNP-CROP-6.pl Lib1_02.R2.fq.gz parsed FastaCQonLibraries GBS-SNP-CROP-7.pl Lib1_03.R1.fq.gz singles FastaForRef GBS-SNP-CROP-8.pl Lib1_03.R2.fq.gz summaries GBS-SNP-CROP-1.pl GBS-SNP-CROP-9.pl Lib1_04.R1.fq.gz GBS-SNP-CROP-2.pl InitialFastaqFiles Lib1_04.R2.fq.gz GBS-SNP-CROP-3.pl Lib1_01.R1.fq.gz Lib1_05.R1.fq.gz

  1. Ok, then the PEAR version seems not to be the issue (which is good news).

  2. Ok, then I leave files unzipped.

--> So, I still have the issue. This is what I run within the v.4.0 folder (copied and pasted the script from tutorial):

~/GBS-SNP-CROP/GBS-SNP-CROP-scripts/v.4.0$ perl GBS-SNP-CROP-4.pl -pr /usr/local/bin/pear -vs /usr/local/bin/vsearch -d PE -b barcodesID.txt -t 10 -cl consout -rl 150 -pl 32 -p 0.01 -id 0.93 -min 32 -MR MR

#################################

GBS-SNP-CROP, Step 4, v.4.0

################################# Parsing Paired-End reads...

2 Error, too large read? Allocate more mem...Illegal division by zero at GBS-SNP-CROP-4.pl line 218.

--> Output files are created but they are empty and it has just run for a second:

~/GBS-SNP-CROP/GBS-SNP-CROP-scripts/v.4.0$ ls barcodesID.txt FastaForRef GBS-SNP-CROP-4.pl GBS-SNP-CROP-8.pl Lib1_01.R2.fq.gz Lib1_03.R2.fq.gz Lib1_04.stitched.fasta Lib1_05.R2.fq.gz singles demultiplexed GBS-SNP-CROP-1.pl GBS-SNP-CROP-5.pl GBS-SNP-CROP-9.pl Lib1_02.R1.fq.gz Lib1_04.assembled.fasta Lib1_04.unassembled.forward.fastq OutputsStep3 summaries distribs GBS-SNP-CROP-2.pl GBS-SNP-CROP-6.pl InitialFastaqFiles Lib1_02.R2.fq.gz Lib1_04.R1.fq.gz Lib1_04.unassembled.reverse.fastq parsed FastaCQonLibraries GBS-SNP-CROP-3.pl GBS-SNP-CROP-7.pl Lib1_01.R1.fq.gz Lib1_03.R1.fq.gz Lib1_04.R2.fq.gz Lib1_05.R1.fq.gz Pear.log

--> Pear.log file looks like this (my comment: it looks like it stops at "computing empirical frequencies"(??)):

PEAR (Zhang et al., 2014) summary results:

Analyzing paired Lib1_04.R1.fq.gz and Lib1_04.R2.fq.gz reads...


| | ____| / \ | \ | |) | | / \ | |) | | /| |__ / | < || |____// __| _\

PEAR v0.9.2 [March 26 2014]

Citation - PEAR: a fast and accurate Illumina Paired-End reAd mergeR Zhang et al (2014) Bioinformatics 30(5): 614-620 | doi:10.1093/bioinformatics/btt593

Forward reads file.................: Lib1_04.R1.fq.gz Reverse reads file.................: Lib1_04.R2.fq.gz PHRED..............................: 33 Using empirical frequencies........: YES Statistical method.................: OES Maximum assembly length............: 999999 Minimum assembly length............: 32 p-value............................: 0.010000 Quality score threshold (trimming).: 0 Minimum read size after trimming...: 1 Maximal ratio of uncalled bases....: 1.000000 Minimum overlap....................: 10 Scoring method.....................: Scaled score Threads............................: 10

Allocating memory..................: 200,000,000 bytes Computing empirical frequencies....:

Manually stitching together unassembled reads results:

--> Could it be a problem of memory on my computer?

Thanks in advance,

Regards,

'Angela Parody Merino

halelab commented 5 years ago

Hi Angela, we've had some personnel changes in the lab recently, with Arthur moving onto a job elsewhere. I apologize that this thread got dropped...may I ask what the status is currently with this issue? Please let me know and I will do my best to help.

Very best, and sorry for the dropped communication here! Iago

angelaparodymerino commented 5 years ago

Thanks for your answer,

I think we can now close this issue because the error "Failed to open files...Illegal division by zero at GBS-SNP-CROP-4.pl line 218" was solved.

Regards,

'Angela