halelab / GBS-SNP-CROP

GBS SNP Calling Reference Optional Pipeline
GNU General Public License v2.0
31 stars 31 forks source link

Step 4 using tutorial data, failing? #27

Closed angelaparodymerino closed 2 years ago

angelaparodymerino commented 4 years ago

Hi,

First of all, thanks for your answer and thanks for the new version!

I just tried to run Step 4 using the output files from Step 3 that I kept on hold until you release the new version and it gave this:

perl GBS-SNP-CROP-4.pl -pr /usr/local/bin/pear -vs /usr/local/bin/vsearch -d PE -b barcodesID.txt -rl 150 -p 0.01 -pl 32 -t 10 -cl consout -id 0.93 -min 32 -MR GSC.MR

#################################
# GBS-SNP-CROP, Step 4, v.4.1
#################################
Parsing paired-end reads for building the Mock Reference...

### Attempting to merge paired reads from files Lib1_04.R1.fq.gz and Lib1_04.R2.fq.gz...

 ____  _____    _    ____ 
|  _ \| ____|  / \  |  _ \
| |_) |  _|   / _ \ | |_) |
|  __/| |___ / ___ \|  _ <
|_|   |_____/_/   \_\_| \_\

PEAR v0.9.2 [March 26 2014]

Citation - PEAR: a fast and accurate Illumina Paired-End reAd mergeR
Zhang et al (2014) Bioinformatics 30(5): 614-620 | doi:10.1093/bioinformatics/btt593

Forward reads file.................: Lib1_04.R1.fq.gz
Reverse reads file.................: Lib1_04.R2.fq.gz
PHRED..............................: 33
Using empirical frequencies........: YES
Statistical method.................: OES
Maximum assembly length............: 999999
Minimum assembly length............: 32
p-value............................: 0.010000
Quality score threshold (trimming).: 0
Minimum read size after trimming...: 1
Maximal ratio of uncalled bases....: 1.000000
Minimum overlap....................: 10
Scoring method.....................: Scaled score
Threads............................: 10

Failed to open files..
### Manually stitching together unmerged reads from files Lib1_04.unassembled.forward.fastq and Lib1_04.unassembled.reverse.fastq...

vsearch v2.13.4_linux_x86_64, 7.7GB RAM, 4 cores
https://github.com/torognes/vsearch

Fatal error: Unable to read from file (Lib1_04.AssembledStitched.fa)
DONE. See file Pear.log for details of paired-end read parsing.

vsearch v2.13.4_linux_x86_64, 7.7GB RAM, 4 cores
https://github.com/torognes/vsearch

Fatal error: Unable to read from file (tmp.fa)
vsearch v2.13.4_linux_x86_64, 7.7GB RAM, 4 cores
https://github.com/torognes/vsearch

Fatal error: Unable to read from file (VsearchIN.fa)
Unable to open VsearchOUT.fa file

The first error message says "Failed to open files..." and I cannot think why this could it be. Input files (ouputs from Step 3) in the folder together with the script for step 4 are:

The generated Pear.log says:

### PEAR (Zhang et al., 2014) summary results:

### Attempting to merge paired reads from files Lib1_04.R1.fq.gz and Lib1_04.R2.fq.gz...
Command: /usr/local/bin/pear -f Lib1_04.R1.fq.gz -r Lib1_04.R2.fq.gz -o Lib1_04 -p 0.01 -n 32 -j 10

For the unmerged PE reads found in files Lib1_04.unassembled.forward.fastq and Lib1_04.unassembled.reverse.fastq:
There were no unmerged reads -- all read pairs were successfully merged!

And it generates some outputs files, but empty files:

I would be grateful to have some ideas of how to fix this issue. Thanks!

Kind regards,

'Angela

halelab commented 4 years ago

Hi Angela -- the output from Step 3 should be a bunch of demultiplexed files in a /demultiplexed sub-directory. These are the proper input files for Step 4, not the parsed and merged files you list above. To ensure that all scripts are talking correctly to one another, I suggest you restart the analysis from Step 1, using all the v.4.1 scripts. Please refer to the schematic at the bottom of the User Manual for help with understanding the full directory structure that is created, along with where each GBS-SNP-CROP script should reside when it is run.

I'll make a better effort to check the board here on a weekly basis. Very best...we'll get this! Iago