how to process the RNA-seq data? log10(count+1) or log2(count+1) or TPM or FPKM?

sunduanchen / Scissor

Scissor package

GNU General Public License v3.0

168 stars 29 forks source link

how to process the RNA-seq data? log10(count+1) or log2(count+1) or TPM or FPKM? #44

Open wang99999shang opened 1 year ago

wang99999shang commented 1 year ago

when i use the rawcount of RNA-seq data, the result is bad. Should i use log10(count+1) or log2(count+1) or TPM or FPKM? Which one is better? Thank you!

Chen-Guanming commented 1 year ago

I guess RNA-seq data should be normalized by Seurat::NormalizeData(), becuase I found the Scissor just got the scRNA-seq normalized data and bound with bulk data by checking the Scissor code.

common <- intersect(rownames(bulk_dataset), rownames(sc_dataset)) sc_exprs <- as.matrix(sc_dataset@assays$RNA@data) dataset0 <- cbind(bulk_dataset[common,], sc_exprs[common,])

jzheng25 commented 1 year ago

I have similar questions. I think default Seurat using log normalization with 10000 scale factor. Does it mean we should re-normalize fkpm or tpm x/sum(x)*10000 and then take log1p? But normalize.quantile seems do normlization by ranking. That comes to my question whether the independent variable(expression) to be normally distribution to fit the assumption of underlying regression?

JZHT-jiangzhou commented 5 months ago

i think we can use the LogNormalize function to normalize the bulk RNA seq, the Seurat_preprocessing function shows : normalization.method = "LogNormalize", scale.factor = 10000