rhondabacher / SCnorm

Normalization for single cell RNA-seq data
47 stars 9 forks source link

Input for SCnorm #22

Closed zji90 closed 6 years ago

zji90 commented 6 years ago

Just wondering whether different kinds of input measures will affect the results? It is stated in the manual that "Estimates of gene expression are typically obtained using RSEM, HTSeq, Cufflinks, Salmon or similar approaches." It seems that these softwares generate different gene expression measures (counts and FPKM/TPM). Is the count data the recommended data type to be fed in? Also for the normalized gene expression counts, is there any specific steps recommended before doing down-stream analysis such as PCA or differential analysis?

rhondabacher commented 6 years ago

Hi Zhicheng,

Thanks for using SCnorm!

The input should be un-normalized gene expression obtained from those methods. They do not necessarily need to be exact counts since measures like RSEM give non-integer expression in the form of Expected Counts. TPM/FPKM/RPKM force the sequencing depth to be exactly one million and so SCnorm would no longer be able to estimate the relationship of each gene's expression versus the sequencing depth. I would use whatever expression measure existed prior to converting to TPM/FPKM/RPKM.

I hope that helps and please don't hesitate to contact me if you have any further questions.

Best, Rhonda

zji90 commented 6 years ago

Thanks for the reply! I am wondering whether the following procedure is a good practice for down-stream analysis, particularly dimension reduction: get normalized counts from SCnorm, log2 transform it, for each gene divide the log normalized counts by its gene length, and do PCA, etc.

rhondabacher commented 6 years ago

Yes, but you might consider dividing by the length before applying the log. For example, if gene X has twice as many counts as gene Y but gene X is also twice as long, then I would want their value going into the PCA to be equal, which means you'd want to do the length correction prior to the log. Otherwise, that seems fine to me.

-Rhonda

zji90 commented 6 years ago

Thanks!