willwerscheid / flashier

A faster and angrier package for EBMF.
https://willwerscheid.github.io/flashier/
Other
10 stars 12 forks source link

Custom covariate driven shrinkage in flash #109

Open william-denault opened 11 months ago

william-denault commented 11 months ago

Hello @willwerscheid

I previously wrote to you on slack in the stephens labs I am trying to use flashier with a custom prior https://cran.r-project.org/web/packages/flashier/vignettes/advanced.html#customizing-the-order-of-operations I am struggling to specify the output of my function correctly within the flashier framework In my case I am trying to user flashier with a covariate moderated ash we implemented. Essentially the prior mixture of each observation is modified given some covariates, so in our case the pi argument in the prior doesn't make much sens.in addition I got this error message


Erreur dans value[[3L]](cond) : 
  Argument to ebnm.fn is incorrectly specified. See ?flash_ebnm for details.

Please find the function below. I think the output seems correctly specified so I am a bit puzzled

Looking forward to get your inputs. Best regards. William


set.seed(1)
devtools::install_github('karltayeb/logisticsusie@s3')
devtools::install_github('karltayeb/comoR')

library(comoR)
library(ebnm)
library(nnet)

theta <- c(rep(0, 100), rexp(100))
s <-rep( 1, length(theta))
x <- theta + rnorm(200, 0, s)

X <- matrix(rnorm(length(theta)*10 ), ncol=10)

cebnm_L <- function( x,s,g_init=FALSE,fix_g=TRUE, output){

  Z <- matrix( 1, nrow=length(x), ncol=1)

  param_como  = list(max_class=10,mnreg_type="mult_reg")
  param_nnet  =list( size=1, decay=1,MaxNWts = 10000)

  data <- comoR:::prep_data_como2 (betahat=x,
                                   se=s, X=X,
                                   Z =Z )
  fit <-  rlang::exec( "data_initialize_como", !!! param_como ,
                       data= data,
                       param_nnet=  param_nnet) # initialize the model from the data
  fit  <- comoR:::fit.como (  fit,  data, max_iter = 5 )

  est    <- comoR:::post_mean_sd  (fit,data)

  g <- ashr::normalmix(rep(1/length(fit$f_list),length(fit$f_list)),
                       rep( 0, length(fit$f_list)),
                       do.call(c, lapply( 1: length(fit$f_list)  ,
                                          function(k) {sqrt(fit$f_list [[k]]$var) } )
                       )
  )

  out <- list( data= data.frame(x=data$betahat,
                                s=data$se),
               posterior = data.frame(mean= est$mean,
                                      sd= est$sd
               ) ,
               fitted_g =  g,
               log_likelihood=sum( comoR:::compute_data_loglikelihood(fit, data) * (fit$post_assignment))

  )

  return( out)

}

res <- cebnm_L( x=theta ,
                s =s )
plot( res$posterior$mean, theta)

res.ebnm  <- ebnm(theta,s )

points( res.ebnm$posterior$mean,theta, col="green")
str(res)
str(res.ebnm)
willwerscheid commented 11 months ago

I get the following error:

> cebnm_L(x = c(-10, 0, 10),
+         s = rep(1, 3),
+         g_init = NULL,
+         fix_g = FALSE,
+         output = NULL)
Error in data_loglik + assignment_loglik : non-conformable arrays

I am unfamiliar with those packages so I am not sure what the error would be but the custom function needs to be able to handle those arguments.

william-denault commented 11 months ago

Hi @willwerscheid

the problem you face is that the function inside is using some covariate for the shrinkage and it is "hard coded"

see in the line data <- comoR:::prep_data_como2 (betahat=x, se=s, X=X, Z =Z ) X is internally used to use an individual specific mixture model (the weight of the ash prior depends on the entries of X)

So your x and s should have ta number of entry equal to the number of rows in X.

Given that X has 200 rows this is working.

cebnm_L(x =rnorm(200), s = rep(1, 200), g_init = NULL, fix_g = FALSE, output = NULL)

willwerscheid commented 11 months ago

Ok, then what you can do here is handle the case where length(x) == 3 just to satisfy the test function. Just give it something that looks like real output, eg if length(x) == 3 return ebnm_flat(args).

william-denault commented 11 months ago

Thanks.

I have a new error now. Error in v %*% X[[n]] : inadequte arguments

Does that ring a bell to you. I seems that loadings or factors are not of the right size right?

Below the exemple I used

`

`rm(list=ls()) library(comoR) library(nnet) library(comoR) simdata <- comoR:::sim_cfactor()

X <- simdata$X_l Y <- simdata$X_f

cebnm_L <- function( x,s,g_init=FALSE,fix_g=TRUE, output){

if (length(x) == 3){ ### just to satisfy check of custom function return (ebnm_flat(x)) } Z <- matrix( 1, nrow=length(x), ncol=1)

param_como = list(max_class=5,mnreg_type="mult_reg") param_nnet =list( size=1, decay=1,MaxNWts = 10000)

data <- comoR:::prep_data_como2 (betahat=x, se=s, X=X, Z =Z ) fit <- rlang::exec( "data_initialize_como", !!! param_como , data= data, param_nnet= param_nnet) # initialize the model from the data fit <- comoR:::fit.como ( fit, data, max_iter = 5 )

est <- comoR:::post_mean_sd (fit,data)

g <- ashr::normalmix(rep(1/length(fit$f_list),length(fit$f_list)), rep( 0, length(fit$f_list)), do.call(c, lapply( 1: length(fit$f_list) , function(k) {sqrt(fit$f_list [[k]]$var) } ) ) )

out <- list( data= data.frame(x=data$betahat, s=data$se), posterior = data.frame(mean= est$mean, sd= est$sd ) , fitted_g = g, log_likelihood=sum( comoR:::compute_data_loglikelihood(fit, data) * (fit$post_assignment))

)

return( out)

} cebnm_F <- function( x,s,g_init,fix_g=TRUE, output){ if (length(x) == 3){ ### just to satisfy check of custom function return (ebnm_flat(x)) }

Z <- matrix( 1, nrow=length(x), ncol=1) Z <- matrix( 1, nrow=length(x), ncol=1)

param_como = list(max_class=5,mnreg_type="mult_reg") param_nnet =list( size=1, decay=1,MaxNWts = 10000)

data <- comoR:::prep_data_como2 (betahat=x, se=s, X=Y, Z =Z ) fit <- rlang::exec( "data_initialize_como", !!! param_como , data= data, param_nnet= param_nnet) # initialize the model from the data fit <- comoR:::fit.como ( fit, data, max_iter = 5 )

est <- comoR:::post_mean_sd (fit,data)

g <- ashr::normalmix(rep(1/length(fit$f_list),length(fit$f_list)), rep( 0, length(fit$f_list)), do.call(c, lapply( 1: length(fit$f_list) , function(k) {sqrt(fit$f_list [[k]]$var) } ) ) )

out <- list( data= data.frame(x=data$betahat, s=data$se), posterior = data.frame(mean= est$mean, sd= est$sd ) , fitted_g = g, log_likelihood=sum( comoR:::compute_data_loglikelihood(fit, data) * (fit$post_assignment))

) return( out)

}

library(flashier) fit_custom <- flash_init(simdata$Y_obs, var_type = 2) %>% flash_set_verbose(0) %>% flash_greedy( Kmax = 2, ebnm_fn = c(cebnm_L, cebnm_F) ) `

willwerscheid commented 11 months ago

your return object out needs to include posterior$second_moment (posterior$sd is not needed). Here is a link to the help page that describes how the return object should be constructed (see under Details): https://willwerscheid.github.io/flashier/reference/flash_ebnm.html

william-denault commented 11 months ago

Thank you I am now able to match the reference you referede to for my ouput. Now flashier is performing 2 laoding update and one factor update before returning this error

Error in (-npoint):0 : result would be too long a vector In addition: Warning message: In log(tau) : NaNs produced

Do you have some ideas?

The updated script is found below

`rm(list=ls()) library(comoR) library(nnet) library(comoR) simdata <- comoR:::sim_cfactor() devtools::load_all(".")

X <- simdata$X_l Y <- simdata$X_f

cebnm_L <- function( x,s,g_init=FALSE,fix_g=TRUE, output){

if (length(x) == 3){ ### just to satisfy check of custom function return (ebnm_flat(x)) } Z <- matrix( 1, nrow=length(x), ncol=1)

param_como = list(max_class=5,mnreg_type="mult_reg") param_nnet =list( size=1, decay=1,MaxNWts = 10000)

data <- comoR:::prep_data_como2 (betahat=x, se=s, X=X, Z =Z ) fit <- rlang::exec( "data_initialize_como", !!! param_como , data= data, param_nnet= param_nnet) # initialize the model from the data fit <- comoR:::fit.como ( fit, data, max_iter = 5 )

est <- comoR:::post_mean_sd (fit,data)

g <- ashr::normalmix(rep(1/length(fit$f_list),length(fit$f_list)), rep( 0, length(fit$f_list)), do.call(c, lapply( 1: length(fit$f_list) , function(k) {sqrt(fit$f_list [[k]]$var) } ) ) )

out <- list( data= data.frame(x=data$betahat, s=data$se), posterior = data.frame(mean= est$mean, second_moment= est$sd^2 ) , fitted_g = g, log_likelihood=sum( comoR:::compute_data_loglikelihood(fit, data) * (fit$post_assignment))

)

return( out)

} cebnm_F <- function( x,s,g_init,fix_g=TRUE, output){ if (length(x) == 3){ ### just to satisfy check of custom function return (ebnm_flat(x)) }

Z <- matrix( 1, nrow=length(x), ncol=1) Z <- matrix( 1, nrow=length(x), ncol=1)

param_como = list(max_class=5,mnreg_type="mult_reg") param_nnet =list( size=1, decay=1,MaxNWts = 10000)

data <- comoR:::prep_data_como2 (betahat=x, se=s, X=Y, Z =Z ) fit <- rlang::exec( "data_initialize_como", !!! param_como , data= data, param_nnet= param_nnet) # initialize the model from the data fit <- comoR:::fit.como ( fit, data, max_iter = 5 )

est <- comoR:::post_mean_sd (fit,data)

g <- ashr::normalmix(rep(1/length(fit$f_list),length(fit$f_list)), rep( 0, length(fit$f_list)), do.call(c, lapply( 1: length(fit$f_list) , function(k) {sqrt(fit$f_list [[k]]$var) } ) ) )

out <- list( data= data.frame(x=data$betahat, s=data$se), posterior = data.frame(mean= est$mean, second_moment= est$sd^2 ) , fitted_g = g, log_likelihood=sum( comoR:::compute_data_loglikelihood(fit, data) * (fit$post_assignment))

) return( out)

}

library(flashier) fit_custom <- flash_init(simdata$Y_obs, var_type = 2) %>% flash_set_verbose(0) %>% flash_greedy( Kmax = 2, ebnm_fn = c(cebnm_L, cebnm_F) ) `

willwerscheid commented 10 months ago

Hi @william-denault , sorry! I missed this issue when it was initially posted. second_moment should be the posterior second moment E(X^2) = Var(X) + (EX)^2. You are currently returning posterior variance rather than posterior second moment.