jtleek / sva-devel

28 stars 46 forks source link

Should the data be scaled before running Combat? #55

Open lincj1994 opened 2 years ago

lincj1994 commented 2 years ago

Hi. @wevanjohnson I have a radiomics dataset with 8 batches, some features of which have an extremely wide range of values, for example, from -1000 to 1e8, and the range between different features also varied extremely, for example, feature 1 (from -0.1 to 0.9) and feature 2 (from 1e4 to 1e8).

  1. I'm wondering if the data should be preprocessed (like scale by features) before running the Combat.
  2. Should I set the parameter of par.prior as TRUE or FALSE? I have tried running the Combat on the raw data with the par.prior=TRUE but it has been running for several days and didn't come to an end (keeping showing the "Finding parametric adjustments"). Should the parameter be set as TRUE when the input data was preprocessed or normalized properly and be set as FALSE if the input data was not preprocessed? Did I understand it correctly?

Thanks. Lin.

wevanjohnson commented 2 years ago

See below:

On Apr 23, 2022, at 8:36 PM, Caijin Lin @.***> wrote:

Hi. I have a dataset, some features of which have an extremely wide range of values, for example, from -1000 to 1e8, and the range between different features also varied extremely, for example, feature 1 (from -0.1 to 0.9) and feature 2 (from 1e4 to 1e8).

I'm wondering if the data should be preprocessed (like scale by features) before running the Combat.

Yes, this might be helpful for computational precision reasons. ComBat scales the data as part of the process but then “unscales” them at the end.

Should I set the parameter of par.prior as TRUE or FALSE? I have tried running the Combat on the raw data with the par.prior=TRUE but it has been running for several days and didn't come to an end. Should the parameter be set as TRUE when the input data was preprocessed or normalized properly and be set as FALSE if the input data was not preprocessed? Did I understand it correctly? Thanks.

I would recommend par.prior=TRUE. Do you have any numerical covariates? If so, remove them, ComBat can only handle categorial covariates.

Lin. — Reply to this email directly, view it on GitHub https://github.com/jtleek/sva-devel/issues/55, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACMBWPERR4Y6E4TDBLATTJDVGSXRRANCNFSM5UFRTMVA. You are receiving this because you are subscribed to this thread.

lincj1994 commented 2 years ago

Hi. Thanks for your reply. I dont think I have included any numerical covariates. Below are the code I used. Whole_Cohort_wave_Combat = ComBat(dat = Whole_Cohort_wave, batch = batch_info$Batch, par.prior=TRUE, ref.batch = "FUSCC_Aurora") Besides, I'm still wondering if I should perform the scale before the ComBat since you mentioned that ComBat will scale the data itself. And if the scale is necessary, then should I scale the data by row (features) or by column (samples)? Thanks.