google / deepvariant

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
BSD 3-Clause "New" or "Revised" License
3.16k stars 713 forks source link

Error while running tests on Calling variants in non-autosomal contigs #853

Closed poddarharsh15 closed 1 month ago

poddarharsh15 commented 1 month ago

Have you checked the FAQ? https://github.com/google/deepvariant/blob/r1.6.1/docs/FAQ.md:

Describe the issue: I am having some errors while fetching variants from chrX, Could you please have a look I added a error.txt file for reference.
Thank you

INVALID_ARGUMENT: Couldn't fetch bases for reference_name: "chrX" start: 14000 end: 15000

Setup

Steps to reproduce:

REF="GRCh38_no_alt_analysis_set.fasta"
BAM="HG002.pfda_challenge.grch38.chrXY.bam"
THREADS=$(nproc)
REGION="chrX chrY"
HAPLOID_CONTIGS="chrX,chrY"
PAR_BED="GRCh38_PAR.bed"

udocker run \
-v "${INPUT_DIR}":"${INPUT_DIR}" \
-v "${OUTPUT_DIR}":"${OUTPUT_DIR}" \
google/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/run_deepvariant \
--model_type PACBIO \
--ref "${INPUT_DIR}/${REF}" \
--reads "${INPUT_DIR}/${BAM}" \
--output_vcf "${OUTPUT_DIR}/${OUTPUT_VCF}" \
--output_gvcf "${OUTPUT_DIR}/${OUTPUT_GVCF}" \
--num_shards "${THREADS}" \
--haploid_contigs "${HAPLOID_CONTIGS}" \
--par_regions_bed "${INPUT_DIR}/${PAR_BED}" \
--regions "${REGION}" \
--intermediate_results_dir "${OUTPUT_DIR}/${INTERMEDIATE_DIRECTORY}"  

error.txt

kishwarshafin commented 1 month ago

@poddarharsh15, can you please check if the files were downloaded correctly and if their sizes look good. The case studies are designed in a way that you can simply copy-paste the commands and it should work. I just tested the case study and it worked on my end.

poddarharsh15 commented 1 month ago

It worked thanks for the help :))

poddarharsh15 commented 1 month ago

Hi @kishwarshafin,

I am trying to add --haploid_contigs="chrX,chrY" in a module from nf-core that I am using. However, when running the command line, I am only detecting chrX variants from the test data. When I try to run the command using a BED file with only chrY, I get an empty VCF file with headers as the result. I also tried using --regions as a parameter, but without success. Could you please suggest some ideas on how to resolve this issue? Thank you for your assistance.

/opt/deepvariant/bin/run_deepvariant \
      --ref=GRCh38_no_alt_analysis_set.fasta \
      --reads=sample1-lane_1.converted.cram \
      --output_vcf=sample1-lane_1.deepvariant.vcf.gz \
      --output_gvcf=sample1-lane_1.deepvariant.g.vcf.gz \
      --haploid_contigs="chrX,chrY" \  

       --regions="chrX chrY" \  
       --model_type PACBIO \
      --regions=chrX_10001-44821.bed \
      --intermediate_results_dir=tmp \
      --num_shards=12

chrY.vcf.gz sample1-lane_1.deepvariant.vcf.gz

kishwarshafin commented 1 month ago

@poddarharsh15 ,

You are using --regions=chrX_10001-44821.bed \ I see. Which means you will run DV on this region only. Can you remove this line and see if it fixes it?