theislab / single-cell-tutorial

Single cell current best practices tutorial case study for the paper:Luecken and Theis, "Current best practices in single-cell RNA-seq analysis: a tutorial"
1.39k stars 458 forks source link

Scale before calculating gene score, and after regressing out cell cycle ? #103

Closed carmensandoval closed 1 year ago

carmensandoval commented 2 years ago

35 #73

Hello,

According to the cell cycle notebook, one should scale adata prior to running score_genes_cell_cycle.

However, there is another scaling performed after regressing out cell cycle. Is this the correct way to do it? Is the second call to sc.pp.scale just going back to the the log-normed data? Or is it scaling an already scaled matrix?

Thanks

sc.pp.filter_genes(adata, min_cells=50)
sc.pp.normalize_total(adata, target_sum=1e4)

adata.raw = adata

sc.pp.log1p(adata)
sc.pp.scale(adata)
sc.tl.score_genes_cell_cycle(adata, s_genes = s_genes, g2m_genes= g2m_genes)

sc.pp.highly_variable_genes(adata)
adata = adata[:, adata.var.highly_variable]

sc.pp.regress_out(adata, ['S_score', 'G2M_score'])
sc.pp.scale(adata)
LuckyMD commented 1 year ago

Hi!

The tutorial you are referring to is pretty old and does indeed scale twice. That is one of the reasons I wrote this current best-practices tutorial for single-cell analysis. I would normally not scale the data at all for calling cell cycle stages as is also shown in the tutorial in this github repo.

carmensandoval commented 1 year ago

Awesome - thanks for your response!