jmbreda / Sanity

Filtering of Poison noise on a single-cell RNA-seq UMI count matrix
GNU General Public License v3.0
65 stars 11 forks source link

Problem with negative value : summed expression of all genes in a cell is approximately -250000 #16

Closed maximelepetit closed 2 years ago

maximelepetit commented 2 years ago

"The gene expression levels are normalized to 1, meaning that the summed expression of all genes in a cell is approximately 1. Note that we use the natural logarithm, so to change the normalization one should multiply the exponential of the expression by the wanted normalization (e.g. mean or median number of captured gene per cell)."

WHen i summed expression of all genes in a cell the value is approximately equal to -250000

> head(colSums(df_norm_m1))
AAACCCACATCACCAA AAACCCAGTAGTAAGT AAACCCAGTGCTTATG AAACCCATCAGTGCGC AAACCCATCCCGATCT AAACGAAAGTGCAAAT 
       -251847.7        -251944.1        -251777.1        -251511.2        -252574.6        -253305.9 

I don't understand this behaviour. Because when i summed the exponenial of expression of all genes in a cell the values are close to 1

head(colSums(exp(df_norm_m1)))
AAACCCACATCACCAA AAACCCAGTAGTAAGT AAACCCAGTGCTTATG AAACCCATCAGTGCGC AAACCCATCCCGATCT AAACGAAAGTGCAAAT 
       0.9324338        1.0953718        0.9208313        1.0439884        0.9802271        0.9916398  

I don't understand why i need to take the exponential to have summed expression close to 1