greenelab / multi-plier

An unsupervised transfer learning approach for rare disease transcriptomics
BSD 3-Clause "New" or "Revised" License
44 stars 10 forks source link

required normalization of input data #72

Closed lbdarragh closed 3 years ago

lbdarragh commented 3 years ago

Hello,

I am trying to run multiPLIER on my own RNA-seq data. The code is working great on your data provided, but I believe that my data is not normalized correctly to work with the code. How would you recommend I normalize my RNAseq data so that it is compatible?

Thank you, Laurel

jaclyn-taroni commented 3 years ago

Hi Laurel,

The majority of the datasets we applied MultiPLIER to for this paper were microarray datasets that were on a log2 transformed scale.

We've applied MultiPLIER to RNA-seq data that has been transformed with the variance stabilizing transformation from DESeq2 (docs), which should be on a log2-like scale, in the past. Given that most of our experience has been with microarray data on a similar scale and that transformation includes correction for size factors per those docs, that would be my recommendation.

In this paper, MultiPLIER was used with RNA-seq data that was quantified with Salmon, collapsed to the gene-level with tximport, and converted to gene symbols with org.Hs.eg.db. This leads me to believe that TPM were used, but would have to clarify to be sure.

Thanks! Jaclyn

lbdarragh commented 3 years ago

Thank you! I got it working.

jaclyn-taroni commented 3 years ago

Great, glad to hear it! I am going to close this issue.