Closed Yijun-Tian closed 2 years ago
Hi @Yijun-Tian , The warning are trivial, you can ignore those.
What specifically looks off to you about the AD field? Please know that not all reads will be considered here, reads with low base quality at that site and with low mapping quality will be excluded from this calculation. You can look at the site on IGV and color the reads based on their mapping quality to see if that's the issue here.
Hi Kishwar, It's great to hear that the warning messages are OK. I just reviewed the BAM coverage and indeed there are some low quality bases there. But I expect the allelic depth field should present as "ref,alt" form for heterozygote variants in the file. But I checked and found such form only appear for a small proportion of the variants, which are all with multiple alternative alleles, see belows:
(base) user@user-x11dai-n:/mnt/raid0/Yijun_Tian/ONT_Open_data$ grep -v "#" FAQ32172Hap.vcf | cut -f7 | sort | uniq -c
1377524 PASS
1457745 refCall
(base) user@user-x11dai-n:/mnt/raid0/Yijun_Tian/ONT_Open_data$ grep -v "#" FAQ32172Hap.vcf | cut -f5 | grep "," | wc -l
5249
(base) user@user-x11dai-n:/mnt/raid0/Yijun_Tian/ONT_Open_data$ grep -v "#" FAQ32172Hap.vcf | cut -f10 | grep "," | wc -l
5249
(base) user@user-x11dai-n:/mnt/raid0/Yijun_Tian/ONT_Open_data$ grep -v "#" FAQ32172Hap.vcf | grep "," | head
1 981098 . G T,A 53.1 PASS . GT:GQ:DP:AD:VAF:C:PS 2/2:3:4:2,2:0.5,0.5:DV:.
1 1019806 . G T,A 4.6 PASS . GT:GQ:DP:AD:VAF:C:PS 0/2:2:4:2,2:0.5,0.5:DV:.
1 1077827 . C G,T 45.8 PASS . GT:GQ:DP:AD:VAF:C:PS 2/2:2:5:2,2:0.4,0.4:DV:.
1 1265603 . TTGTGTGTGTGTGTGTGTGTG T,TTGTGTGTGTGTGTGTGTGTGTGTGTGTG 13.5 PASS . GT:GQ:DP:AD:VAF:C:PS 0/2:4:8:2,2:0.25,0.25:DV:.
1 1595896 . G GAC,C 10.6 PASS . GT:GQ:DP:AD:VAF:C:PS 1/1:5:5:2,2:0.4,0.4:DV:.
1 2235710 . C T,A 52 PASS . GT:GQ:DP:AD:VAF:C:PS 2/2:6:4:2,2:0.5,0.5:DV:.
1 2660634 . G C,A 17 PASS . GT:GQ:DP:AD:VAF:C:PS 1/1:17:7:5,2:0.714,0.286:P:.
1 3361002 . TTG TTCTG,T 7.8 PASS . GT:GQ:DP:AD:VAF:C:PS 1/2:2:8:2,2:0.25,0.25:DV:.
1 3619116 . G GTC,C 8.2 PASS . GT:GQ:DP:AD:VAF:C:PS 0/2:2:4:2,2:0.5,0.5:DV:.
1 3815090 . C T,CT 7.4 PASS . GT:GQ:DP:AD:VAF:C:PS 0/2:2:6:3,3:0.5,0.5:DV:.
Hi @Yijun-Tian, Lets take this call as an example:
1 1019806 . G T,A 4.6 PASS . GT:GQ:DP:AD:VAF:C:PS 0/2:2:4:2,2:0.5,0.5:DV:.
The ref is G
, the alt-alleles at this position are T
and A
.
DP: Depth of the site is 4
AD: Allelic depth of T, A
is 2,2
meaning T
is supported by 2 reads and A
is supported by 2 reads.
VAF: Variant allele frequency = AD/DP are similarly 0.5 and 0.5
Does this clarify anything?
Hi Kishwar, I think based on your description, this field should AC instead of AD. In below specs, the AD field is for "Total read depth for each allele" But the AC field for "Allele count in genotypes, for each ALT allele, in the same order as listed" Check v4.3 page 8
https://samtools.github.io/hts-specs/VCFv4.1.pdf https://samtools.github.io/hts-specs/VCFv4.2.pdf https://samtools.github.io/hts-specs/VCFv4.3.pdf
Hi @Yijun-Tian ,
v4.3 specifications were out on April 2022, we released this in March. In version v4.1 and v4.2 you won't see any specification for AD under "Fixed fields" section.
Thank you for bringing this to our attention tough, it's very helpful.
Hello, I ran the docker program and get some PEPPER_MARGIN_DEEPVARIANT_FINAL_OUTPUT.vcf.gz file. But the AD field doesn't look right:
Should I expect the AD to follow a pattern of "A,B"? I checked the logs and found a few error or warning messages: 1_pepper.log and 3_pepper_hp.log:
/usr/local/lib/python3.8/dist-packages/torch/onnx/symbolic_opset9.py:2095: UserWarning: Exporting a model to ONNX with a batch_size other than 1, with a variable length with LSTM can cause an error when running the ONNX model with a different batch size. Make sure to save the model with a batch size of 1, or define the initial states (h0/c0) as inputs of the model.
warnings.warn("Exporting a model to ONNX with a batch_size other than 1, " + 5_deepvariant.log:Is there anything wrong?
Thanks