stephens999 / ashr

An R package for adaptive shrinkage
GNU General Public License v3.0
80 stars 36 forks source link

Add a new argument "gpart" #72

Open mengyin opened 7 years ago

mengyin commented 7 years ago

I add a new argument "gpart" in the main ash function. "gpart" provides some mixture components that we want to keep fixed when estimating the mode.

This is motivated by Joyce's analysis on scRNA-seq data, where we would like to fit Poisson ash with a bimodal prior. The prior is a mixture of two unimodal distribution: g_0, a unimodal distribution with mode at 0; and g_1, a unimodal distribution with non-zero mode. In this example we want to search g_1's non-zero mode while keeping g_0's components fixed.

stephens999 commented 7 years ago

Thanks @mengyin I'd like to avoid the main ash function becoming overburdened with additional options. Can you achieve what you want here with a wrapper to the ash function rather than by adding a parameter? eg. the following wrapper implements the search for a mode: https://github.com/stephens999/ashr/blob/master/R/ashn.R (in this case I did allow the user to call this wrapper by calling the original ash with mode="estimate", for convenience. But for more specialist uses we might just make them call the wrapper).

A related thought: one could write a general wrapper for doing ash over bimodal distributions. And then allowing for one or both modes to be fixed (which i think is what you are essentially doing here?)

We can discuss both points in person, or further here....

mengyin commented 7 years ago

Thanks @stephens999 for the comments! I agree that too many parameters will overburden the main function. However I am not sure how to easily achieve what I want here without adding a parameter...

To fit a distribution with unknown mode(s), we need an objective function for optimization, which calls ash (with a fixed g which depends on the given modes) and returns the log-likelihood. So the question here is whether to set g inside the main ash function or in the wrapper function?

Say, in ashn.R, the objective function test.op(c) simply calls ash(...,mode=c,...), because with "mode=c" the main ash function will automatically set g. In my example, I added a parameter such that the main ash function sets a bimodal g. I could replace it by calling a wrapper function like

ash.estbimodal = function(betahat,modemin,modemax,...){ test.op = function(c){ g = set_g(mode=c,second_mode=c2,...) # set bimodal g given the modes return(-ash(betahat=betahat,g=g,outputlevel="loglik",...)$loglik) } opt = stats::optimize(test.op,interval=c(modemin,modemax)) return(opt$minimum) }

However, the key step "set_g" here needs copy all those lines in the main ash function (almost 100 lines from "##2. Generating mixture distribution g") so I am not sure if this is better than directly editing the main ash function. Or do you think in ash.R we should modularize this part and put it into a separate function set_g?

stephens999 commented 7 years ago

I 100% think we should modularize this to a set_g function - great idea!

On Tue, Jun 6, 2017 at 4:41 PM, Mengyin Lu notifications@github.com wrote:

Thanks for the comments! I agree that too many parameters will overburden the main function. However I am not sure how to easily achieve what I want here without adding a parameter...

To fit a distribution with unknown mode(s), we need an objective function for optimization, which calls ash (with a fixed g which depends on the given modes) and returns the log-likelihood. So the question here is whether to set g inside the main ash function or in the wrapper function?

Say, in ashn.R, the objective function test.op(c) simply calls ash(...,mode=c,...), because with "mode=c" the main ash function will automatically set g. In my example, I added a parameter such that the main ash function sets a bimodal g. I could replace it by calling a wrapper function like

ash.estbimodal = function(betahat,modemin,modemax,...){ test.op = function(c){ g = set_g(mode=c,second_mode=c2,...) # set bimodal g given the modes return(-ash(betahat=betahat,g=g,outputlevel="loglik",...)$loglik) } opt = stats::optimize(test.op,interval=c(modemin,modemax)) return(opt$minimum) }

However, the key step "set_g" here need copy all those lines in the main ash function (almost 100 lines from "##2 https://github.com/stephens999/ashr/pull/2. Generating mixture distribution g") so I am not sure if this is better than directly editing the main ash function. Or do you think in ash.R we should modularize this part and put it into a separate function set_g?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stephens999/ashr/pull/72#issuecomment-306625790, or mute the thread https://github.com/notifications/unsubscribe-auth/ABt4xQmvYQk7OA57cxvVuObBWneoEDLbks5sBceCgaJpZM4NtJxh .