stephenslab / gtex-eqtls

GTEx eQTL analysis
http://stephenslab.github.io/gtex-eqtls
0 stars 2 forks source link

Dealing with missing data in permutation testing #8

Open gaow opened 9 years ago

gaow commented 9 years ago

The SVD error issue was a result of missing data: expression and SNP data are tissue specific, thus 1) some individuals will not have expression level and 2) some SNPs will not present in the data of some tissues. When the label of individuals are shuffled in permutation testing, there are good chance that either SNP or expression data are missing which crashes SVD. The current fix is to set all summary stats to NaN when it happens in a permutation run.

These "invalid" permutations should also not contribute to computation of empirical p-values and median of BFs -- is this the proper way to deal with it?

gaow commented 9 years ago

This patch implements paragraph 2 of the comment above -- exclude those invalid permutations from the sample space.

Before this patch, on a problematic data with 100 permutations the result is:

gene nb.snps join.perm.pval nb.permutations true.l10abf med.perm.l10abf
ENSG00000087495.12 34 0.0891089 100 -2.87E-01 -nan

With the patch:

gene nb.snps join.perm.pval nb.permutations true.l10abf med.perm.l10abf
ENSG00000087495.12 34 0.6 14 -2.87E-01 -2.35E-01

Just not sure if this is the right thing to do.