Open gaow opened 9 years ago
This patch implements paragraph 2 of the comment above -- exclude those invalid permutations from the sample space.
Before this patch, on a problematic data with 100 permutations the result is:
gene | nb.snps | join.perm.pval | nb.permutations | true.l10abf | med.perm.l10abf |
---|---|---|---|---|---|
ENSG00000087495.12 | 34 | 0.0891089 | 100 | -2.87E-01 | -nan |
With the patch:
gene | nb.snps | join.perm.pval | nb.permutations | true.l10abf | med.perm.l10abf |
---|---|---|---|---|---|
ENSG00000087495.12 | 34 | 0.6 | 14 | -2.87E-01 | -2.35E-01 |
Just not sure if this is the right thing to do.
The SVD error issue was a result of missing data: expression and SNP data are tissue specific, thus 1) some individuals will not have expression level and 2) some SNPs will not present in the data of some tissues. When the label of individuals are shuffled in permutation testing, there are good chance that either SNP or expression data are missing which crashes SVD. The current fix is to set all summary stats to NaN when it happens in a permutation run.
These "invalid" permutations should also not contribute to computation of empirical p-values and median of BFs -- is this the proper way to deal with it?