saezlab / transcriptutorial

This is a tutorial to guide the analysis of RNAseq dataset using footprint based tools such as DOROTHEA, PROGENY and CARNIVAL
https://saezlab.github.io/transcriptutorial/
GNU General Public License v3.0
55 stars 30 forks source link

Batch correction #24

Closed ulrmu closed 3 years ago

ulrmu commented 3 years ago

Oh hai, Thank you for developing this pipeline! I wonder, do you have any suggestions on how to handle/preprocess data from different batches upstream of the differential analysis? Cheers!

adugourd commented 3 years ago

Hi, it depends on the type of data, but usually tools such as VSN are pretty flexible for normalisation of your data, and the removeBatchEffect function of limma is quite simple and efficient to remove batc heffect. You may wanny try "combat" as well for your batch effect, if your data doesn't have missing values.

ulrmu commented 3 years ago

Thank you for your quick reply! Does this mean you can replace the normalization method that you use in part 1 by any normalization method that does batch correction? For example, can you use either of the methods you propose as an input to runLimma in part 2?

ulrmu commented 3 years ago

Alternatively, it seems that runLimma can accept a regress_out argument which calls removeBatchEffect. Using this strategy, batch correction would occur after normalization in part 1. Could you comment if this would be an acceptable approach? :)

adugourd commented 3 years ago

regress_out is still kind of an experimental feature. I would not advise to use it at the moment :) I haven't taken the time to complete it yet.

I would say that yes, any normalisation and batch effect correction is acceptable, at the users discretion :) There is many possible solutions for that question.

ulrmu commented 3 years ago

Thanks for the advise :) I will do the batch correction in an upstream step. Thanks again!