I'm analyzing the rare variants in msigdb pathways using set based input.
Example of the setFile GOBP_MONOCARBOXYLIC_ACID_METABOLIC_PROCESS 1:10399063-10399125,1:10399063-10399330,1:10399628-10399704,1:10400392-10400572,1:10403070-10403136,1:10404160-10404279,1:10408070-10408140,1:10411417-10411552,1:10413061-10413251,1:10416986-10417117,1:10417375-10417509,1:10418825-10418925,1:10419416-10419539,1:10419629-10420511,1:109656075-109656425,1:109656098-109656425,1:109656711-109656787,1:109657214-109657279,1:109657589-109657671,1:109657771-109657872,1:109658813-109658909,1:109658999-109659110,1:109661164-109661700,1:109661164-109661709,1:109665010-109665497,1:109668021-109668151,1:109668056-109668151,1:109668424-109668500,1:109668924-109668989,1:109669289-109669371,1:109669470-109669571,1:109671286-109671382,1:109671472-109671583,1:109674746-109675286,1:109681808-109681834,1:109683206-109683997,.....................
And for some of the pathways, as the one above, I am seeing the N00 and N10 values (counts of ref allele in controls n cases, respectively) being less than N10 and N11 respectively (counts of alt allele in controls n cases, respectively).
Can you help me understand when this could be happening?
This is the output of the CMCFisherExact.assoc excluding the RANGE column, using --freqUpper 0.01 and all other default options (such as --impute mean). N00 is 22 and N10 is 400, thus N00 < N10 and N01 is 2 and N11 is 59, thus N01 < N11.
I'm wondering if I've prepared my input vcf incorrectly perhaps. I'm finding it hard to find the exact variants being included in this set so that I can see the genotypes of the cases and controls. Would appreciate your insights.
I'm analyzing the rare variants in msigdb pathways using set based input.
Example of the
setFile
GOBP_MONOCARBOXYLIC_ACID_METABOLIC_PROCESS 1:10399063-10399125,1:10399063-10399330,1:10399628-10399704,1:10400392-10400572,1:10403070-10403136,1:10404160-10404279,1:10408070-10408140,1:10411417-10411552,1:10413061-10413251,1:10416986-10417117,1:10417375-10417509,1:10418825-10418925,1:10419416-10419539,1:10419629-10420511,1:109656075-109656425,1:109656098-109656425,1:109656711-109656787,1:109657214-109657279,1:109657589-109657671,1:109657771-109657872,1:109658813-109658909,1:109658999-109659110,1:109661164-109661700,1:109661164-109661709,1:109665010-109665497,1:109668021-109668151,1:109668056-109668151,1:109668424-109668500,1:109668924-109668989,1:109669289-109669371,1:109669470-109669571,1:109671286-109671382,1:109671472-109671583,1:109674746-109675286,1:109681808-109681834,1:109683206-109683997,.....................
And for some of the pathways, as the one above, I am seeing the N00 and N10 values (counts of ref allele in controls n cases, respectively) being less than N10 and N11 respectively (counts of alt allele in controls n cases, respectively). Can you help me understand when this could be happening?
This is the output of the
CMCFisherExact.assoc
excluding theRANGE
column, using--freqUpper 0.01
and all other default options (such as --impute mean). N00 is 22 and N10 is 400, thus N00 < N10 and N01 is 2 and N11 is 59, thus N01 < N11.I tried to see if avoiding imputation for missing genotypes would fix this (--impute drop). But N00 is 2 and N10 is 73, thus N00 < N10.
I'm wondering if I've prepared my input vcf incorrectly perhaps. I'm finding it hard to find the exact variants being included in this
set
so that I can see the genotypes of the cases and controls. Would appreciate your insights.