Closed angelaparodymerino closed 4 years ago
Hi Angela, thank you for all your feedback. Based on the comments you've provided, I've gone back through all the scripts and have indeed found some small errors that are leading to the problems you mention. I am in the process of updating all the scripts and will release them very soon (hopefully in the next few days) under v4.1. Please hang tight, and thanks again! Iago
Hi Angela, it took me longer than I'd hoped to iron it all out, but v.4.1 is now live. All the problems you've flagged in the past (all four issues) have been addressed. In some cases, the root cause was simple typos in the example line in the User Manual. In others, small inconsistencies in the code were the culprit. Anyway, should all be good to go now -- please let us know how things go. Iago
Hi,
Since I am stuck in step 5 with the tutorial data, I have been advancing the analysis of my own dataset. I was trying to get to step 5 (as I did with the tutorial data) but I got stuck in step 4. This is the run:
It creates the following output files, but empty:
-Pear.log -PosToMask.txt -VsearchIN.fa <--- This one was not a final output when I successfully run STEP 4 with tutorial data. -GSC.MR.Clusters.fa -GSC.MR.Genome.fa
and a folder: FastaForRef
More details:
1) I actually have 18 samples but I decided to run the pipeline with only 3 (to see if it works) -it should be faster - barcodesID.txt looks like this
2) As I said previously: https://github.com/halelab/GBS-SNP-CROP/issues/23#issuecomment-529711985 input files were unzipped (this is required for PEAR versions I am using) and lines 130 and 131 of the script STEP4 was modified accondingly. This was done for the tutorial data and it solved one of the issues I was having.
3) I have noticed two things when comparing the input files of the tutorial data and my input files. Tutorial data input files for STEP 4 weight 3-5 Mb each, while my fastq files weight 500-700Mb, which is quite a difference! Could this be a problem? Could it be related to the error message? (which actually seem to indicate something about vsearch and not about memory..or? :/ I don't know...).
4) My fastq files contain quite a lot of repetitive sequences (average of 30% sequences are over-represented), which is a consequence of how the library preparation was done. Anyway, this is probably the main reason of having such bigger fastq files compared to the tutorial fastq files. Is having many repetitive sequences going to be a problem?
Let me know if you need more details.
Thanks in advance,
'Angela Parody Merino