Closed sophienguyen01 closed 1 year ago
Hi Sophie,
You have a few of options:
1) The first option is like Andrew mentioned in a previous post, by running DeepVariant on each of them and use GLnexus to merge them and identify the de novo mutations from the joint call file.
2) DeepTrio outputs the child VCF as noted by the flag --output_vcf_child
. Then you would need to compare those variants across multiple samples (with some truth sets) against the parents, to ensure they are truly DNM and are not false positives. That is quite a bit of work to perform properly.
3) You can use things external tools, of which there are many :)
Hope it helps, Paul
Thank you for your answer
I will close this issue now. If you have further questions feel free to reopen. Thank you @pgrosu.
HI @pgrosu ,
For option1, after merging into a joint call file, which tool can I use to call de novo variants? According to this tutorial, RTG mendelian is only able to calculate non-mendelian rate, not identify de novo mutations.
For option3, did you mean to use other tool after DeepTrio producing vcf for each sample? I have done a bit of research and there are not that many and most of the tools are old and not well-maintained. Can you list some of the recommendations here?
Thanks, Sophie
Hi Sophie,
So as you know, besides the genetic information passed on from the parents, each of us is born with an additionally small number of novel genetic changes called de novo mutations (i.e. from environmental effects, etc). These traits are thus not passed from the parents, thus violating Mendelian inheritance.
So when you use rtg-tools mendelian
with the --output
flag, it will save an updated VCF file annotated with calls violating Mendelian inheritance, thus highlighting the de novo mutations. The information (header) fields in these updated annotated VCF files will have the following:
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=MCV,Number=.,Type=String,Description="Variant violates mendelian inheritance constraints">
##INFO=<ID=MCU,Number=.,Type=String,Description="Mendelian consistency status can not be determined">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DN,Number=1,Type=String,Description="De novo allele">
##FORMAT=<ID=MCP,Number=.,Type=String,Description="Describes the expected genotype ploidy in cases where the given genotype does not match the expected ploidy">
Each de novo call that violated Mendelian inhertance will be annotated like this:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT father mother son1 son2 daughter1 daughter2-initial daughter2
Chr1 4917 . A G . . MCV=daughter2:0|0+0|0->0|1 GT:DN 0|0 0|0 0|0 0|0 0|0 0|0 0|1:Y
Chr1 15214 . G C . . MCV=daughter2:0|0+0|0->1|0 GT:DN 0|0 0|0 0|0 0|0 0|0 0|0 1|0:Y
Chr2 4883 . T G . . MCV=daughter2:0|0+0|0->0|1 GT:DN 0|0 0|0 0|0 0|0 0|0 0|0 0|1:Y
Chr2 11369 . G A . . MCV=daughter2:0|0+0|0->0|1 GT:DN 0|0 0|0 0|0 0|0 0|0 0|0 0|1:Y
Chr3 11754 . A G . . MCV=daughter2:0|0+0|0->0|1 GT:DN 0|0 0|0 0|0 0|0 0|0 0|0 0|1:Y
Chr4 37470 . C T . . MCV=daughter2:0|0+0|0->1|0 GT:DN 0|0 0|0 0|0 0|0 0|0 0|0 1|0:Y
Below are a few tools that can also perform trio analysis (generating their own VCF), or can perform VCF refinement based on pedigree information:
The key point to take away from this is not that there are options, but how these options internally work to infer the genotype and its probability given the data. Some work better with longer reads, and some with shorter reads. You want to play with them to get a feel of what is happening given different data. If you are curious, you can read the papers and mathematics behind each approach, and you'll be surprised by their similarity in approaches of inferring the call and its probability (quality). I have included a list of papers with links in the reference section below.
Now if the above is too easy, and you want to make de novo variant calling more exciting, you can use the glnexus
with the config --config DeepVariant_unfiltered
, which is basically the following Yaml config file indicating to GLnexus to operate under specific parameters conditions. So when you perform GLnexus joint variant calling, you will get the three sample columns (father/mother/child) in your joint VCF. To determine a de novo call, you just look for genotypes that would not follow Mendelian inheritance, such as 0/0 0/0 0/1
, such as:
chr7 54624683 chr7_54624683_A_AATC A AATC 27 . AF=0.166667;AQ=27 GT:DP:AD:GQ:PL:RNC 0/0:39:22,16:28:27,0,48:.. 0/0:40:40,0:50:0,120,1199:.. 0/1:28:28,0:50:0,90,899:..
Though keep in mind DeepTrio/GLnexus might produce false positives - based on low read quality (low MAPQ), or other factors such as over-representation of multi-site aligned reads - where such a call might be labeled 0/1 0/0 0/0
, with IGV supporting more the call of 0/1 0/1 0/0
. Otherwise if the read quality is good, and alignments are unique with proper coverage then it might actually be de novo, though the proband (child) calls are the more interesting ones. For this you would need to have more samples to ensure the calls are not false positives, with further IGV inspection and assay validation. If this might be a bit too fun, feel free to skip it, but it's here if you are curious to dive deeper in the possible de novo calls from DeepTrio/GLnexus.
Basically the big idea is take it slow and have fun to get the most of out it, as with many moving parts (programs + parameters) and varied data you want to be confident in the calls - which can take a lot of finesse. With super-clean data, that's not such a big deal - but that's not why we use these tools :)
Hope it helps, Paul
[1] RTG Tools Manual [2] dv-trio: a family-based variant calling pipeline using DeepVariant [3] FamSeq: A Variant Calling Program for Family-Based Sequencing Data Using Graphics Processing Units [4] DeepTrio: Variant Calling in Families Using Deep Learning [5] A unified haplotype-based method for accurate and comprehensive variant calling (This is the Octopus paper.) [6] DeNovoGear: de novo indel and point mutation discovery and phasing
Thank you @pgrosu, I totally omitted the output of modified vcf from rtg mendelian. This is exactly what I am looking for
Hi Sophie,
You are very welcome, and that is absolutely understandable. Feel free to reach out again if you have more questions.
Paul
Hi,
I want to ask if DeepTrio provides an option to generate denovo variants. I ran the tool using exact files provided in the tutorial, receive 3 vcf and 3gvcf but that's about it. How can I get the denovo variant from the child vcf output?
Thanks