Dealing with missing data in permutation testing - Githubissues

stephenslab / gtex-eqtls

GTEx eQTL analysis

http://stephenslab.github.io/gtex-eqtls

0 stars 2 forks source link

Dealing with missing data in permutation testing #8

Open gaow opened 9 years ago

gaow commented 9 years ago

The SVD error issue was a result of missing data: expression and SNP data are tissue specific, thus 1) some individuals will not have expression level and 2) some SNPs will not present in the data of some tissues. When the label of individuals are shuffled in permutation testing, there are good chance that either SNP or expression data are missing which crashes SVD. The current fix is to set all summary stats to NaN when it happens in a permutation run.

These "invalid" permutations should also not contribute to computation of empirical p-values and median of BFs -- is this the proper way to deal with it?

gaow commented 9 years ago

This patch implements paragraph 2 of the comment above -- exclude those invalid permutations from the sample space.

Before this patch, on a problematic data with 100 permutations the result is:

gene	nb.snps	join.perm.pval	nb.permutations	true.l10abf	med.perm.l10abf
ENSG00000087495.12	34	0.0891089	100	-2.87E-01	-nan

With the patch:

gene	nb.snps	join.perm.pval	nb.permutations	true.l10abf	med.perm.l10abf
ENSG00000087495.12	34	0.6	14	-2.87E-01	-2.35E-01

Just not sure if this is the right thing to do.