jtleek / sva-devel

28 stars 46 forks source link

question regarding ComBat(mean.only=TRUE) vs. FALSE #11

Open abelew opened 8 years ago

abelew commented 8 years ago

Greetings, I have been playing with various sva/ComBat parameters and noticed that for some data sets I receive an error when mean.only==TRUE, but not FALSE. The error is centered the creation of delta.hat and the following density plot of it. It seems that when this occurs, the rbind() leads to a delta.hat data frame which is entirely 1, leading to a failure in creating the density plot. I am guessing that this is indicative of a problem in the data, but I am not sure what it might be. Do you have any suggestions off-hand? I forked the repository and am poking through that section to see what I might learn. Thank you for your time.

wevanjohnson commented 8 years ago

Sorry about this. It works on my data and examples. Can you send me (offline is fine) a simple example or small subset of your data (can be fewer genes and samples) that fails. I will be happy to take a look and fix any bugs if needed.

abelew commented 8 years ago

Greetings! I would be happy to, but don't want to waste your time -- is there a format which would be most useful? I am thinking that a .rda with the expressionsets and experimental design might be most useful for diagnosing what is going on, would that be best? I will check in with my professor though to ensure that he does not mind me sending along count tables. I have a strong suspicion that the only problem with the code is that it is not seeing a weakness in the data when scale is TRUE. I spent a little time last night poking at ComBat.r and am reasonably certain that the problem happens at the creation of delta.hat on line

  1. I would love to understand the code a little more, may I query you for neophyte-proof definitions of B.hat, gamma.hat, delta.hat, and gamma.bar, and the *.star variables? I suspect that if I had read the sva paper more carefully I might know the answers to that question, so if your answer is 'look at the paper' I will completely understand.

    Thank you for your time, -atb

On 05/03/2016 11:45 AM, W. Evan Johnson wrote:

Sorry about this. It works on my data and examples. Can you send me (offline is fine) a simple example or small subset of your data (can be fewer genes and samples) that fails. I will be happy to take a look and fix any bugs if needed.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/jtleek/sva-devel/issues/11#issuecomment-216571953

wevanjohnson commented 8 years ago

Sounds great. An .rda would be perfect along with your ComBat command. If you are worried about sharing the data, you could remove sample and row names, and an abbreviated set of data, say 100-1000 genes is sufficient. I'm wondering if its a function of your data, e.g. if you have all zeroes for a batch or treatment combo that is blowing up the estimates. Alternatively, if its a bug in ComBat I want to fix it ASAP. Thanks!

SamGG commented 3 months ago

Hi, I got the same error. I checked with the data provided in the example. When mean.only=TRUE, ComBat() fails. @wevanjohnson could you verify your test? In my hand:

Browse[1]> a.prior
[1] Inf Inf Inf Inf Inf
Browse[1]> b.prior
[1] Inf Inf Inf Inf Inf
Browse[1]> summary(c(delta.hat))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1       1       1       1       1       1 

I run

library(bladderbatch)
data(bladderdata)
bladderEset
dat <- bladderEset[1:50,]
pheno = pData(dat)
edata = exprs(dat)
batch = pheno$batch
mod = model.matrix(~as.factor(cancer), data=pheno)
combat_edata1 = ComBat(dat=edata, batch=batch, mod=NULL, par.prior=TRUE, prior.plots=TRUE, mean.only = TRUE)

Best.