halelab / GBS-SNP-CROP

GBS SNP Calling Reference Optional Pipeline
GNU General Public License v2.0
31 stars 31 forks source link

Can't exec "usr/local/bin/vsearch": No such file or directory at GBS-SNP-CROP-4.pl line 265. Unable to open VsearchOUT.fa file #25

Closed angelaparodymerino closed 4 years ago

angelaparodymerino commented 5 years ago

Hi,

Since I am stuck in step 5 with the tutorial data, I have been advancing the analysis of my own dataset. I was trying to get to step 5 (as I did with the tutorial data) but I got stuck in step 4. This is the run:

perl GBS-SNP-CROP-4.pl -pr /usr/local/bin/pear -vs usr/local/bin/vsearch -d PE -b barcodesID.txt -t 10 -cl consout -rl 91 -pl 32 -p 0.01 -id 0.93 -db 3 -min 32 -MR GSC.MR

#################################
# GBS-SNP-CROP, Step 4, v.4.0
#################################
Parsing Paired-End reads...

DONE.

Can't exec "usr/local/bin/vsearch": No such file or directory at GBS-SNP-CROP-4.pl line 265.
Unable to open VsearchOUT.fa file

It creates the following output files, but empty:

-Pear.log -PosToMask.txt -VsearchIN.fa <--- This one was not a final output when I successfully run STEP 4 with tutorial data. -GSC.MR.Clusters.fa -GSC.MR.Genome.fa

and a folder: FastaForRef

More details:

1) I actually have 18 samples but I decided to run the pipeline with only 3 (to see if it works) -it should be faster - barcodesID.txt looks like this

TTGTACGGT       Lib1_3A YES
GGTGCAATA       Lib1_3B YES
AACAGTTAT       Lib1_3C YES
CCACTCCGA       Lib1_3D NO
TTGTCGGAT       Lib1_3E NO
GGTGACTTA       Lib1_3F NO
AACAATAGT       Lib1_4A NO
CCACCAGAT       Lib1_4B NO
TTGTGGCTA       Lib1_4C NO
GGTGTACGA       Lib1_4D NO
AACGCTAAT       Lib1_4E NO
CCAAGCTTA       Lib1_4F NO
TTGCTAGGT       Lib1_7A NO
GGTTAGCAA       Lib1_7D NO
AATCGTATT       Lib1_7E NO
CCAACGTGT       Lib1_7F NO
TTCTACGAA       Lib1_7A2        NO
GCGGTAATA       Lib1_7D2        NO

2) As I said previously: https://github.com/halelab/GBS-SNP-CROP/issues/23#issuecomment-529711985 input files were unzipped (this is required for PEAR versions I am using) and lines 130 and 131 of the script STEP4 was modified accondingly. This was done for the tutorial data and it solved one of the issues I was having.

3) I have noticed two things when comparing the input files of the tutorial data and my input files. Tutorial data input files for STEP 4 weight 3-5 Mb each, while my fastq files weight 500-700Mb, which is quite a difference! Could this be a problem? Could it be related to the error message? (which actually seem to indicate something about vsearch and not about memory..or? :/ I don't know...).

4) My fastq files contain quite a lot of repetitive sequences (average of 30% sequences are over-represented), which is a consequence of how the library preparation was done. Anyway, this is probably the main reason of having such bigger fastq files compared to the tutorial fastq files. Is having many repetitive sequences going to be a problem?

Let me know if you need more details.

Thanks in advance,

'Angela Parody Merino

halelab commented 4 years ago

Hi Angela, thank you for all your feedback. Based on the comments you've provided, I've gone back through all the scripts and have indeed found some small errors that are leading to the problems you mention. I am in the process of updating all the scripts and will release them very soon (hopefully in the next few days) under v4.1. Please hang tight, and thanks again! Iago

halelab commented 4 years ago

Hi Angela, it took me longer than I'd hoped to iron it all out, but v.4.1 is now live. All the problems you've flagged in the past (all four issues) have been addressed. In some cases, the root cause was simple typos in the example line in the User Manual. In others, small inconsistencies in the code were the culprit. Anyway, should all be good to go now -- please let us know how things go. Iago