hbctraining / scRNA-seq_online

https://hbctraining.github.io/scRNA-seq_online/.
493 stars 175 forks source link

DESeq2 normalization #102

Closed jc271828 closed 11 months ago

jc271828 commented 1 year ago

Hi!

Thank you for writing the super helpful workbook (https://hbctraining.github.io/scRNA-seq_online/lessons/pseudobulk_DESeq2_scrnaseq.html) on using DESeq2 to do DE analysis with scRNASeq data! I was wondering why in the Top 20 genes' normalized counts plotting step (a little past midway through the workbook, search "top20_sig_df" to locate. It's a gray block of code), the normalized counts are log10 transformed?

I understand that we're plotting "value" in the data frame top20_sig_df. And top20_sig_df is a subset of normalized_counts, with normalized_counts being calculated from normalized_counts <- counts(dds, normalized = TRUE).

I just can't find in DESeq2 vignettes that the normalized counts have been log10 transformed. I thought it was just a normalization using the "size factors". Please correct me if I'm wrong!

Thank you so much!

Jingxian

jc271828 commented 1 year ago

@rkhetani

jc271828 commented 1 year ago

Also, are baseMean and the normalized counts on different scales (i.e., one is log-transformed and one is not)?

mistrm82 commented 11 months ago

Hi @jc271828,

Yes you are right that the normalized counts are log transformed (and so is the value in the baseMean column).

The log10 transform you are referring to is in the visualization of data. The reason we use scale_y_continuous(trans = 'log10') is because gene expression data are heavily skewed on a linear (even after being normalized). There are some genes that are incredibly lowly expressed, and others on the extreme high end. This log10 transform allows us to make the data more symmetrical for visualisation purposes. Note the that the underlying data remains unchanged