Improve the label of CS when some effects are filtered out

stephenslab / susieR

R package for "sum of single effects" regression.

https://stephenslab.github.io/susieR

Other

174 stars 44 forks source link

Improve the label of CS when some effects are filtered out #94

Closed gaow closed 4 years ago

gaow commented 4 years ago

Here is output of susie_get_cs on one example we have:

> susie_get_cs(fit,X,min_abs_corr = 0)
$cs
$cs$L2
[1] 3 4
$cs$L1
[1] 1 2 3 4 5
$purity
   min.abs.corr mean.abs.corr median.abs.corr
L2       0.9881        0.9941          0.9941
L1       0.7041        0.8648          0.8717
$cs_index
[1] 2 1
$coverage
[1] 0.95
> fit$alpha
           [,1]       [,2]   [,3]   [,4]    [,5]
[1,] 0.26705710 0.26939717 0.1503 0.1629 0.15035
[2,] 0.26535287 0.26766796 0.1516 0.1641 0.15125
[3,] 0.26502052 0.26733071 0.1518 0.1644 0.15142
[4,] 0.26646037 0.26879156 0.1507 0.1634 0.15066
[5,] 0.00002937 0.00003051 0.6622 0.3220 0.01578

but clearly L2 should correspond to the 5th effect where variable 3 and 4 should be in the CS, as one can tell from fit$alpha. So it should read $cs$L5 not currently cs$L2.

Also related to #93 fit$alpha should also have row and column names when applicable.

gaow commented 4 years ago

@pcarbo sorry but would you provide me a minimal working example you had for reproducing this issue?

pcarbo commented 4 years ago

@gaow Here is a minimal working example:

library(mvtnorm)
library(susieR)
ns <- 1000
n  <- 1000
b  <- c(0,1,1,0,0)
S  <- rbind(c(   1, 0.99,  0.7,  0.7, 0.9),
            c(0.99,    1,  0.7,  0.7, 0.9),
            c( 0.7,  0.7,    1, 0.99, 0.8),
            c( 0.7,  0.7, 0.99,    1, 0.8),
            c( 0.9,  0.9,  0.8,  0.8,   1))
set.seed(1)
X <- rmvnorm(n,sigma = S)
y <- drop(X %*% b + 3*rnorm(n))
fit <- susie(X,y,L = 5,s_init = susie_init_coef(5:1,rep(1,5),5))
susie_get_cs(fit)
round(fit$alpha,digits = 2)

Using susieR 0.9.0.580, I got:

> susie_get_cs(fit)
$cs
$cs[[1]]
[1] 3 4

$cs[[2]]
[1] 1 2

$coverage
[1] 0.95

> round(fit$alpha,digits = 2)
     [,1] [,2] [,3] [,4] [,5]
[1,] 0.20 0.20 0.20 0.20  0.2
[2,] 0.20 0.20 0.20 0.20  0.2
[3,] 0.00 0.00 0.78 0.22  0.0
[4,] 0.20 0.20 0.20 0.20  0.2
[5,] 0.22 0.78 0.00 0.00  0.0

But I don't see how I can use this output to map the CSs to the rows of fit$alpha. Did I do the right thing?

gaow commented 4 years ago

@pcarbo I see. We typically dont do that additionally step with susie_get_cs that does not factor in purity. eg below works:

> fit <- susie(X,y,L = 5,s_init = susie_init_coef(5:1,rep(1,5),5))
> fit$sets
$cs
$cs$L3
[1] 3 4

$cs$L5
[1] 1 2

$purity
   min.abs.corr mean.abs.corr median.abs.corr
L3    0.9906903     0.9953452       0.9953452
L5    0.9897910     0.9948955       0.9948955

$cs_index
[1] 3 5

$coverage
[1] 0.95

I just pushed a patch though to fix the case when the function is called without purity filter:

> fit <- susie(X,y,L = 5,s_init = susie_init_coef(5:1,rep(1,5),5))
> susie_get_cs(fit)
$cs
$cs$L3
[1] 3 4

$cs$L5
[1] 1 2

$coverage
[1] 0.95

Adding rownames to alpha requires changing the saved results of unit testing. I postpone it to #93 . Please close this ticket as you see fit.

pcarbo commented 4 years ago

@gaow It is looking good now. Thanks.

Adding rownames to alpha requires changing the saved results of unit testing.

Or you could ignore the row and column names in your unit tests.