google / deepvariant

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
BSD 3-Clause "New" or "Revised" License
3.22k stars 725 forks source link

DeepVariant running slow #856

Closed DineshRavindraRaju closed 3 months ago

DineshRavindraRaju commented 3 months ago

Have you checked the FAQ? https://github.com/google/deepvariant/blob/r1.6.1/docs/FAQ.md:

Describe the issue: (A clear and concise description of what the issue is.)

Hello All,

I have been testing ONT datasets on the HPC cluster to benchmark and optimize them. While using the mapped ONT BAM files from the HG002 and HG003 datasets from the UCSC studies, I observed that DeepVariant gets stuck at the make_examples stage. Even after 24 hours, it remains in the same stage which is unsual. I would appreciate your input on this issue.

Setup

Steps to reproduce:

apptainer exec --bind Deepvariant/HG002_HG003_1.5.0 deepvariant_1.5.0.sif /opt/deepvariant/bin/run_deepvariant --model_type ONT_R104 --ref Homo_sapiens_assembly38.fasta --reads HG002_GRCh38_ONT-UL_UCSC_20200508.phased.bam --output_vcf HG002_chr1.output.vcf.gz
--output_gvcf HG002_chr1.output.g.vcf.gz
--regions chr1 --num_shards 56 --logging_dir chr1 --intermediate_results_dir chr1/intermediate_results

Does the quick start test work on your system? Please test with https://github.com/google/deepvariant/blob/r1.6/docs/deepvariant-quick-start.md. Is there any way to reproduce the issue by using the quick start?

Yes, it did work

Any additional context:

kishwarshafin commented 3 months ago

@DineshRavindraRaju , you are using R9 data which is lot more noisier than R10.4 data we trained our models on. For R9, we recommend using PEPPER-DeepVariant, you can see the mention on our github landing page. The current model is for R10.4 chemistry only.