theislab / scib

Benchmarking analysis of data integration tools
MIT License
294 stars 63 forks source link

Questions about the principle of cell cycle conservation #293

Closed HelloWorldLTY closed 2 years ago

HelloWorldLTY commented 2 years ago

Hi, I wonder why we can utilize cc conservation to evaluate the performance of data integration or batch correction. I have two main concerns:

  1. Why do we need to rely on the G2,S,M stages, not G1? Are there any constraints?
  2. What is the principle of this method? Why does the variance need to be reserved? I believe this method intends to preserve the contribution of different PCs' variance to the main variance, but why can it work?

Thanks.

LuckyMD commented 2 years ago

Hi @HelloWorldLTY,

To answer your questions:

  1. We typically have markers for S phase and G2/M phase transition from publications. The standard assignment of phases relies on these marker lists. G1 phase is assigned to all cells not assigned to the other two phases.
  2. The idea is that the CC should have the same overall variance contribution before and after integration. It doesn't have to contribute to the same PCs... but in general it should contribute similarly to the overall variance. If for some reason integration decreases the variance contribution of the cell cycle (CC phase may be regarded as a small contributor to the overall variance than e.g., cell type), then biological variation is not conserved by that integration method.

Hope that's clear.