Open qserenali opened 5 years ago
Does the burden test supports group by KeggID for example? Thanks!
@qserenali it should support grouping by anything.
Surprisingly, only one gene set returned association results
How about those 4 genes sets that used to work on chrom 22? From the face value of the error message it simply says all your samples are missing variants in those genes from that pathway, which doesnt seem likely right? A method to debug would be first create a variant table involving one gene set in question then you count number of variants and number of samples within it.
Thanks for the tip. I think I figured out. When I created the project using the child projects, vtools thinks there are 81122 samples and hence missing = 81122-809=17033 for the first row of data below. For the sample table sample_name is the same for the data from 22 imports. Is there a way to make vtools know that those data belong to the same subject? Or l have to import the vcf using one file only? I am surprised though that there was one gene set that made it through.
keggpathway_kgid sample_size_CFisher num_variants_CFisher total_mac_CFisher statistic_CFisher pvalue_CFisher hsa00472 803 7 13 1.14471 0.778341
vtools init main --children ../chr1 ../chr2 ../chr3 ../chr4 ../chr5 ../chr6 ../chr7 ../chr8 ../chr9 ../chr10 ../chr11 ../chr12 ../chr13 ../chr14 ../chr15 ../chr16 ../chr17 ../chr18 ../chr19 ../chr20 ../chr21 ../chr22 [qli2@awsahenva1007 main]$ vtools output hsa04740_variants chr pos ref alt total wildtype mutants missing num hom het other -l 20 1 949472 G A 809 809 0 17033 0 0 0 0 1 949491 G A 809 808 1 17033 1 0 1 0 1 949597 C T 809 789 20 17033 20 0 20 0
[qli2@awsahenva1007 main]$ vtools show samples -l 25 sample_name filename phenotype affection _merge_from 100016 /home/ql....recode.vcf.gz Case 2 chr1 100016 /home/ql....recode.vcf.gz Case 2 chr2 100016 /home/ql....recode.vcf.gz Case 2 chr3 100016 /home/ql....recode.vcf.gz Case 2 chr4 100016 /home/ql....recode.vcf.gz Case 2 chr5 100016 /home/ql....recode.vcf.gz Case 2 chr6 100016 /home/ql....recode.vcf.gz Case 2 chr7 100016 /home/ql....recode.vcf.gz Case 2 chr8 100016 /home/ql....recode.vcf.gz Case 2 chr9 100016 /home/ql....recode.vcf.gz Case 2 chr10 100016 /home/ql....recode.vcf.gz Case 2 chr11 100016 /home/ql....recode.vcf.gz Case 2 chr12 100016 /home/ql....recode.vcf.gz Case 2 chr13 100016 /home/ql....recode.vcf.gz Case 2 chr14 100016 /home/ql....recode.vcf.gz Case 2 chr15 100016 /home/ql....recode.vcf.gz Case 2 chr16 100016 /home/ql....recode.vcf.gz Case 2 chr17 100016 /home/ql....recode.vcf.gz Case 2 chr18 100016 /home/ql....recode.vcf.gz Case 2 chr19 100016 /home/ql....recode.vcf.gz Case 2 chr20 100016 /home/ql....recode.vcf.gz Case 2 chr21 100016 /home/ql....recode.vcf.gz Case 2 chr22 100387 /home/ql....recode.vcf.gz Case 2 chr1
From: gaow notifications@github.com Sent: Monday, September 23, 2019 5:59 PM To: vatlab/varianttools varianttools@noreply.github.com Cc: Li, Qingqin [JRDUS] QLi2@its.jnj.com; Mention mention@noreply.github.com Subject: [EXTERNAL] Re: [vatlab/varianttools] gene set based burden test (#119)
Does the burden test supports group by KeggID for example? Thanks!
@qserenalihttps://github.com/qserenali it should support grouping by anything.
Surprisingly, only one gene set returned association results
How about those 4 genes sets that used to work on chrom 22? From the face value of the error message it simply says all your samples are missing variants in those genes from that pathway, which doesnt seem likely right? A method to debug would be first create a variant table involving one gene set in question then you count number of variants and number of samples within it.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/vatlab/varianttools/issues/119?email_source=notifications&email_token=ADHLYTM2PTLMTUCRT2WND23QLE33NA5CNFSM4IZRS5CKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7MMZSA#issuecomment-534301896, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADHLYTN2DHYT2TWM3VQEO6LQLE33NANCNFSM4IZRS5CA.
For the single gene set that worked, it was because it contained only one gene. [qli2@awsahenva1007 main]$ vtools output hsa00472_variants chr pos ref alt total wildtype mutants missing num hom het other -l 20 12 109283278 C T 809 809 0 17033 0 0 0 0 12 109284027 T C 810 808 2 17032 2 0 2 0 12 109293187 G A 806 805 1 17036 1 0 1 0 12 109294252 C T 811 811 0 17031 0 0 0 0 12 109278806 A T 809 807 2 17033 2 0 2 0 12 109283273 G A 809 808 1 17033 1 0 1 0 12 109292482 C T 810 805 5 17032 5 0 5 0 12 109294209 C T 811 810 1 17031 1 0 1 0 12 109294301 C T 811 810 1 17031 1 0 1 0
From: gaow notifications@github.com Sent: Monday, September 23, 2019 5:59 PM To: vatlab/varianttools varianttools@noreply.github.com Cc: Li, Qingqin [JRDUS] QLi2@its.jnj.com; Mention mention@noreply.github.com Subject: [EXTERNAL] Re: [vatlab/varianttools] gene set based burden test (#119)
Does the burden test supports group by KeggID for example? Thanks!
@qserenalihttps://github.com/qserenali it should support grouping by anything.
Surprisingly, only one gene set returned association results
How about those 4 genes sets that used to work on chrom 22? From the face value of the error message it simply says all your samples are missing variants in those genes from that pathway, which doesnt seem likely right? A method to debug would be first create a variant table involving one gene set in question then you count number of variants and number of samples within it.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/vatlab/varianttools/issues/119?email_source=notifications&email_token=ADHLYTM2PTLMTUCRT2WND23QLE33NA5CNFSM4IZRS5CKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7MMZSA#issuecomment-534301896, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADHLYTN2DHYT2TWM3VQEO6LQLE33NANCNFSM4IZRS5CA.
I did a test run with a single chromosome data on gene set based burden test, knowing that genes belonging to the same pathway may be from different chromosomes. The test run worked.
Now I created the full data set with all 22 chromosome data, and re-run the analysis. Surprisingly, only one gene set returned association results and most gave the following error message.
I see the intermediate query as follow, but I don't fully understand it. Does the burden test supports group by KeggID for example? Thanks!