roland-rad-lab / MoCaSeq

Analysis pipelines for cancer genome sequencing in mice.
Other
20 stars 15 forks source link

Couldn't read file MGP.v5.snp_and_indels.exclude_wild.vcf.gz #20

Open mabba777 opened 1 year ago

mabba777 commented 1 year ago

Dear, I was running MoCaSeq test, the reference genome was properly downloaded and indexed and the fastq files were aligned and processed in the /temp folder, but didn't appear in the MoCaSeq_test/results/BAM folder, and for that no variants are detected. MoCaSeq_Test.Normal.cleaned.sorted.readgroups.marked.bam MoCaSeq_Test.Normal.cleaned.sorted.readgroups.marked.bam.bai MoCaSeq_Test.Tumor.cleaned.sorted.readgroups.marked.bam MoCaSeq_Test.Tumor.cleaned.sorted.readgroups.marked.bam.bai Please find attached the MoCaSeq report MoCaSeq_Test.report.txt

Also, I was running a normal-tumor real case reproducing the same issue but also with some problem to mark read duplicates during BAM processing, see report:
C7HI1.report.txt

Do you have any advice? Thanks a lot.

NikdAK commented 1 year ago

So the crucial error line is this: A USER ERROR has occurred: Couldn't read file file:///var/pipeline/ref/GRCm38.p6/MGP.v5.snp_and_indels.exclude_wild.vcf.gz. Error was: It doesn't exist.

Apparently the script "Preparation_GenerateSangerMouseDB.sh" was not successfully executed and the SANGER mouse SNP-DB is missing. This (one-time) processing involves some heavy downloads, so maybe this failed for you in the initial genome generation? The final file should be ~3GB, constructed from the individual SANGER VCF files. In theory you can just execute that script either within or outside of docker to generate the missing files.

mabba777 commented 1 year ago

You are right, the MGP.v5.snp_and_indels.exclude_wild.vcf.gz was not generated. I have tried executing the Preparation_GenerateSangerMouseDB.sh (provided with the Docker and the updated version) without success. I think the issue is that there is a missing url when trying: wget -nv -c -r -P ftp://ftp.ebi.ac.uk/pub/databases/mousegenomes/REL-1505-SNPs_Indels/strain_specific_vcfs/ Where can I download the MGP.v5.snp_and_indels.exclude_wild.vcf.gz file?