google / deepvariant

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
BSD 3-Clause "New" or "Revised" License
3.18k stars 721 forks source link

DeepVariant Variant Allele Frequency #843

Closed Wasya-the-Wolf closed 3 months ago

Wasya-the-Wolf commented 3 months ago

Hello, Thanks for this fast and useful germline calling tool. When I used DeepVariant 1.6.0 for single sample WES germline calling, I found that some real germline mutations with VAF (variant allele frequency) values less than 0.3 to 0.4 could not be called. IGV view figures of these variants are below. May I ask if DeepVariant considers VAF parameters during runtime or sets threshold filtering for VAF parameters? We look forward to your reply. Our Codes(All variables have been defined): singularity run \ -B "${INPUT_DIR}":"/input","${OUTPUT_DIR}":"/output" \ deepvariant_1.6.0.sif \ /opt/deepvariant/bin/run_deepvariant \ --model_type=WES \ --ref=/input/testinput/human_g1k_v37_modified.fasta \ --reads=/input/${i}.sorted.markdup.BQSR.bam \ --regions /input/testinput/use_agilent_region_padding_100.bed \ --output_vcf=/output/${i}.vcf.gz \ --output_gvcf=/output/${i}.g.vcf.gz \ --intermediate_results_dir /output/intermediate_results_dir/${i} \ --num_shards=10 IGV figures: EP90t_R EP40b

kishwarshafin commented 3 months ago

Hi @Wasya-the-Wolf ,

When you say "could not be called", does that mean the variant is absent from the VCF or it's a REFCALL? The thresholds are set in the fist step of DeepVariant, which is make_examples and any candidates passing frequency thresholds are then run through the CNN for genotyping.

Wasya-the-Wolf commented 3 months ago

Hi @Wasya-the-Wolf ,

When you say "could not be called", does that mean the variant is absent from the VCF or it's a REFCALL? The thresholds are set in the fist step of DeepVariant, which is make_examples and any candidates passing frequency thresholds are then run through the CNN for genotyping.

Thank you very much for providing useful information. May I ask what parameters do I need to set in make_example to adjust the VAF threshold?

kishwarshafin commented 3 months ago

@Wasya-the-Wolf , the default parameters should be able to call variants with high sensitivity. Can you please explain this part of your question:

When you say "could not be called", does that mean the variant is absent from the VCF or it's a REFCALL?

Wasya-the-Wolf commented 3 months ago

@Wasya-the-Wolf , the default parameters should be able to call variants with high sensitivity. Can you please explain this part of your question:

When you say "could not be called", does that mean the variant is absent from the VCF or it's a REFCALL?

Yes, it is absent from the VCF, not REFCALL. I have checked the raw vcf files, and it turned out that these variants did not appear in my output.

kishwarshafin commented 3 months ago

@Wasya-the-Wolf ,

I'd suggest using:

--make_examples_extra_args "vsc_min_fraction_indels=0.10,vsc_min_fraction_snps=0.10"

And set it to your desired fraction. Although by default it low for WES which means those variants should appear in the output. But you can put a small value and see if you can rescue some of these variants.

Wasya-the-Wolf commented 3 months ago

@Wasya-the-Wolf ,

I'd suggest using:

--make_examples_extra_args "vsc_min_fraction_indels=0.10,vsc_min_fraction_snps=0.10"

And set it to your desired fraction. Although by default it low for WES which means those variants should appear in the output. But you can put a small value and see if you can rescue some of these variants.

Thank you very much for your help! Before adding parameters, I have a small question: What does the vsc_min_fraction parameter do?

kishwarshafin commented 3 months ago
time docker run -it \
google/deepvariant:1.6.1 \
 /opt/deepvariant/bin/make_examples --helpfull | grep 'vsc_min_fraction_snps' -A 5

Shows:

--vsc_min_fraction_snps: SNP alleles occurring at least this fraction of all
    counts in our AlleleCount will be advanced as candidates.
    (default: '0.12')

You can look at the full set of parameters by removing grep. Please also consider seeing how deepvariant works and the DeepVariant manuscript: https://www.nature.com/articles/nbt.4235 for more details.

Wasya-the-Wolf commented 3 months ago
time docker run -it \
google/deepvariant:1.6.1 \
 /opt/deepvariant/bin/make_examples --helpfull | grep 'vsc_min_fraction_snps' -A 5

Shows:

--vsc_min_fraction_snps: SNP alleles occurring at least this fraction of all
    counts in our AlleleCount will be advanced as candidates.
    (default: '0.12')

You can look at the full set of parameters by removing grep. Please also consider seeing how deepvariant works and the DeepVariant manuscript: https://www.nature.com/articles/nbt.4235 for more details.

ok,thank you!

kishwarshafin commented 3 months ago

I also recommend you reading this: https://github.com/google/deepvariant/blob/r1.6.1/docs/FAQ.md#why-does-deepvariant-not-call-a-specific-variant-in-my-data.

I will close this issue for now. Please reopen if you have further questions.