Open freedomq8 opened 4 years ago
@freedomq8 all association tests are nominal p-values. You can take them and do your own multiple testing control (Bonferroni or FDR)
Thanks for the quick response. The way I correct it is dividing 0.05/total number of variants and that would be the threshold. A follow up question; I want to do fisher exact test on mitochondrial variants (mitochondria is haploid although heteroplasmy do exists) and wanted to know if the current fisher test in vtools can be used for that or I need to change amend it ?
thanks
Current Fisher test assumes additive model so it essentially just counts minor alleles. As long as for a variant under the null your samples are homogeneous in terms of ploidy (ie, for the same variant they are all haploid) it should be fine. For case with heteroplasmy I'm thinking maybe those are the events you are interested in detecting -- I'm curious what'd the numeric coding for genotypes be like for a sample with different mtDNA allele for a variant, due to heteroplasmy. what does the data look like?
thanks for getting back to me. looking at the equation of the fisher test: Fisher_exact(num_var_alleles_case, num_var_alleles_ctrl, 2num_gt_case, 2num_gt_ctrl). its counting the number of variant alleles for both cases and control. so the question is since mitochondrial variants are majority homozygous as its haploid wouldn't that effect the p value? and yes all the samples should be homogeneous so doesn't that mean it should be fine ? heteroplasmpy variants are not of interest of my fields, these can be not true due to mapping/caller tool issues or true.
`
chr | pos | ref | alt | maf | refGene_name2 | het | hom | mut_type | num_var_alleles_case | num_gt_case | num_var_alleles_ctrl | num_gt_ctrl | prop_pval | hom_case | het_case | hom_ctrl | het_ctrl | total |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MT | 6419 | A | C | 0.035484 | . | 22 | 0 | . | 14 | 532 | 8 | 88 | 0.007624 | 0 | 14 | 0 | 8 | 620 |
MT | 7521 | G | A | 0.107492 | . | 12 | 27 | . | 57 | 527 | 9 | 87 | 1 | 24 | 9 | 3 | 3 | 614 |
MT | 146 | T | C | 0.291935 | . | 3 | 89 | . | 141 | 532 | 40 | 88 | 0.001735 | 70 | 1 | 19 | 2 | 620 |
MT | 16183 | - | C | 0.038596 | . | 10 | 6 | . | 16 | 488 | 6 | 82 | 0.11482 | 4 | 8 | 2 | 2 | 570 |
` data sorted by heteroplasmy so you can have an idea but they are not the common but they do exists. On igv they would look like regular autosomal heterozgyous some reads wildtype some they have snp. heteroplasmy can be inherited but with different percentages in mitochondrial population or acquired with age.
it would be a nice feature do include for mitochondria a multivariate regression for association test. Our station is doing this tests on R but i have to change the VCF genotype and none of the tests results in signficant results including fisher test. But i do get significant results with vtools some am not sure where causing this; is it the way he is doing analysis or is it fisher test in vtools not optimised for mitochondrial variants
and yes all the samples should be homogeneous so doesn't that mean it should be fine ?
You mean the genotype coding is either 0 or 2 for haploid, and is 0, 1 or 2 for heteroplasmpy? then that's problematic for haploid, because you count each haplotype twice. It in effect doubled your sample size in a biased way. For example:
> fisher.test(matrix(c(0,10,10,10),2,2))$p.value
[1] 0.01099374
> fisher.test(matrix(c(0,5,5,5),2,2))$p.value
[1] 0.1008991
You see p-values are different. I think you need to do something like hom_count/2
to use with your Fisher's test. For heteroplasmpy it should work as is although you are not interested in them. So maybe even filter those variants before you apply /2
or ignore them when you interpret the results?
include for mitochondria a multivariate regression for association test.
what do you mean by multivariate regression -- you mean multiple regression? For regression analysis allele coding will not matter to p-value although your regression coefficient will have different interpretation.
This is the vcf for the mitochondria variants called with varscan, gatk haplocaller, and strelka the have the same format of vcf. below a snp where some samples are wildtype, missing,homoplasmy,heterplasmy Example 1: `
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- MT | 57 | . | T | TC,C | 115249.7 | . | AC=85,4;AF=0.050,2.331e-03;AN=1716;BaseQRankSum=-1.479e+00;ClippingRankSum=0.00;DP=32708;ExcessHet=0.0000;FS=1.413;InbreedingCoeff=0.9862;MLEAC=86,4;MLEAF=0.050,2.331e-03;MQ=55.62;MQRankSum=0.831;NDA=2;QD=30.83;ReadPosRankSum=0.466;SOR=1.077 | GT:AD:DP:GQ:PGT:PID:PL | ./.:24,0,0:24:.:.:.:0,0,0,0,0,0 | 0/0:47,0,0:47:99:.:.:0,106,1440,106,1440,1440 | ./.:44,0,0:44:.:.:.:0,0,0,0,0,0 | 1/1:0,65,0:65:99:1|1:57_T_TC:2908,202,0,2908,202,2908 | 2/2:0,0,115:115:99:1|1:55_T_C:5175,5175,5175,346,346,0 | 0/1:9,17,0:26:99:0|1:57_T_TC:527,0,243,554,294,849
` homoplasmy 1/1 , heteroplasmy 0/1 (or 0/2 ,1/2 for allele with two alternate alleles for more the one sample) and wildtype 0/0 and missing ./.
Example 2: another caller tool gatk mutect2 used but use its with vtools yet has a different VCF format: homoplasmy and heteroplasmy have the same genotype code 0/1 or 0|1 and wild type 0/0 and missing ./.:
so will example 1 is compatible with current fisher test or I need to adjust for it. and any feedback on example 2 with vtools?
for example 1: if i need to change the forumla for haploid, what shall i change in the codes below ?
`vtools update variant --from_stat 'total=#(GT)' 'num=#(alt)' 'het=#(het)' 'hom=#(hom)' 'other=#(other)' 'minDP=min(DP_geno)' 'maxDP=max(DP_geno)' 'meanDP=avg(DP_geno)' 'maf=maf()' 'missing=#(missing)'
vtools update variant --from_stat 'num_gt_case=#(GT)' 'num_var_alleles_case=#(alt)' --samples phenotype=1
vtools update variant --from_stat "num_gt_ctrl=#(GT)" "num_var_alleles_ctrl=#(alt)" --samples phenotype=0
vtools update variant --from_stat 'hom_case=#(hom)' 'het_case=#(het)' --samples phenotype=1
vtools update variant --from_stat "hom_ctrl=#(hom)" "het_ctrl=#(het)" --samples phenotype=0
vtools update variant --set "prop_pval=Fisher_exact(num_var_alleles_case, num_var_alleles_ctrl, 2num_gt_case, 2num_gt_ctrl)"`
Also here is the top p-value with their genotype and size number: `
chr | pos | ref | alt | maf | refGene_name2 | het | hom | mut_type | num_var_alleles_case | num_gt_case | num_var_alleles_ctrl | num_gt_ctrl | prop_pval | hom_case | het_case | hom_ctrl | het_ctrl | total |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MT | 279 | T | C | 0.009649 | . | 1 | 5 | . | 11 | 377 | 0 | 763 | 4.93E-06 | 5 | 1 | 0 | 0 | 1140 |
MT | 15169 | A | G | 0.020833 | . | 0 | 12 | . | 18 | 379 | 6 | 773 | 2.67E-05 | 9 | 0 | 3 | 0 | 1152 |
MT | 5501 | A | G | 0.034662 | . | 0 | 20 | . | 26 | 379 | 14 | 775 | 4.88E-05 | 13 | 0 | 7 | 0 | 1154 |
` the size of samples is correct Appreciate your help and feedback
Regardin the regression test its Multivariate (adjusting for covariate i.e sex, age and haplogroups) logistic regression analysis. used in this paper (4th figure)
Hi there,
I am using the fisher exact test following the instructions on http://varianttools.sourceforge.net/Association/ExactTest . I am using it for autosomal and mitochondrial variants and wanted to know if the p-value under prop_pval column is it adjusted for multiple testing and/or Bonferroni corrected ? Thanks