Kaitlin has written a python tool that can serve as model. From Kaitlin:
The most recent version is attached (3.93). The biggest issue that I've yet to resolve is how to handle multi-allelic lines above tri-allelic. Gets into nightmare territory quickly.
To run this, you'll need three things:
1) VCF of interest
2) PED file for the families in the VCF
3) The ESP variant counts file that I made (.gz for the moment since it is so large)
You can find this file here: /humgen/atgu1/fs03/wip/kaitlin/all_ESP_counts_5.28.13.txt
The command line argument should look like this:
python de_novo_finder_3.py all_ESP_counts_5.28.13.txt
I suggest specifying an output file. There are a few optional flags that you can use to adjust things in the script.
-v, --annotatevariants_VEP: If you have VEP annotations in the ANNOTATION column of the VCF, this will pull out and print the gene name and mutation type
-t, --thresh: The PL threshold set for the next most likely genotype in the child. This gets after how confident you want the het call to be in the child. Default is a PL threshold of 20, but you can adjust that up or down if you'd like.
-c, --minchildAB: I require that the heterozygous child has at least 20% alternative reads. You can adjust that with this.
-d, --depthratio: I require that the depth of coverage in the child is at least 1/10th that of the combined parental depth. You can adjust that with this (integer input for the 1/x).
-m, --prob(dn)metric: The minimum p(DN) that you will accept. I have it set at 0.05 and you could adjust it up. Due to the validation likelihoods that were added, you won't get anything below 0.05.
-p, --maxparentAB: I require that the parents, who should both be homozygous reference, have no more than 5% alternative reads. You can adjust that with this flag.
-a, --annotatevariants: If you have SnpEff annotations in the ANNOTATION column of the VCF, this will pull out and print the gene name and mutation type
From @jbloom22 on September 1, 2015 18:20
Kaitlin has written a python tool that can serve as model. From Kaitlin:
The most recent version is attached (3.93). The biggest issue that I've yet to resolve is how to handle multi-allelic lines above tri-allelic. Gets into nightmare territory quickly.
To run this, you'll need three things: 1) VCF of interest 2) PED file for the families in the VCF 3) The ESP variant counts file that I made (.gz for the moment since it is so large) You can find this file here: /humgen/atgu1/fs03/wip/kaitlin/all_ESP_counts_5.28.13.txt
The command line argument should look like this: python de_novo_finder_3.py all_ESP_counts_5.28.13.txt
I suggest specifying an output file. There are a few optional flags that you can use to adjust things in the script.
-v, --annotatevariants_VEP: If you have VEP annotations in the ANNOTATION column of the VCF, this will pull out and print the gene name and mutation type
-t, --thresh: The PL threshold set for the next most likely genotype in the child. This gets after how confident you want the het call to be in the child. Default is a PL threshold of 20, but you can adjust that up or down if you'd like.
-c, --minchildAB: I require that the heterozygous child has at least 20% alternative reads. You can adjust that with this.
-d, --depthratio: I require that the depth of coverage in the child is at least 1/10th that of the combined parental depth. You can adjust that with this (integer input for the 1/x).
-m, --prob(dn)metric: The minimum p(DN) that you will accept. I have it set at 0.05 and you could adjust it up. Due to the validation likelihoods that were added, you won't get anything below 0.05.
-p, --maxparentAB: I require that the parents, who should both be homozygous reference, have no more than 5% alternative reads. You can adjust that with this flag.
-a, --annotatevariants: If you have SnpEff annotations in the ANNOTATION column of the VCF, this will pull out and print the gene name and mutation type
Copied from original issue: cseed/hail#44