theislab / cellrank

CellRank: dynamics from multi-view single-cell data
https://cellrank.org
BSD 3-Clause "New" or "Revised" License
347 stars 46 forks source link

Real Schur Decomposition High Values #1123

Closed AlinaKurjan closed 1 year ago

AlinaKurjan commented 1 year ago

Hi all,

Thank you so much for developing this wonderful package. I was wondering if you could comment on the "usual" real Schur decomposition values that you would expect to see when decomposing the data into 20 components. I am working with developmental tissue data with about 30k cells, and running Schur decomposition for the first 20 components yields real values between 0.9 and 1.0 for all of them (there are eigengaps identified, but the actual drops in real value are minimal). Is this something you would expect to see with this amount of cells or with developmental data? Does it matter? (asking because in the tutorials I can see your values ranging between 0.5 and 1.0 for the first 20 components)

Untitled

If I understand right, does it indicate 1-2 major states and a lot of intermediate cell states in the data?

Any help or tips with interpretation of those results would be greatly appreciated!

Marius1311 commented 1 year ago

Hi @AlinaKurjan, I think this spectrum looks fine, the range you get for the real part of the top 20 eigenvalues is not unusual. In our initial and terminal states tutorial, the cell number is much smaller (approx. 2500 cells) compared to your dataset, so I would expect your spectrum to be somewhat more "crowded". Keep in mind that the real part has to be between -1 and 1, so the additional eigenvalues from the larger matrix have to go "somewhere".

Your spectrum indicates that 3 or 11 might be good options to test for the number of macrostates. The algorithm would not allow you to compute 2 macrostates, as eigenvalues 2 and 3 are complex conjugates of each other and they want to "stay together"(we describe this more formally in the methods section of the CellRank 1 paper in case you're interested). If you request 2 macrostates, it will automatically compute 3 and give you a warning.

Now, to figure out whether the macrostates you identify with, say n=11 states, are initial, intermediate or terminal, I suggest looking at the coarse-grained transition matrix (see the tutorial I linked above), and using any prior knowledge about the system that you might have. For example, some macrostates might overlap with clusters you have already annotated in your data, and you might know where they reside in the differentiation hierachy. Or you might have experiental time points in your datset, so naturally macrostates with many early-day cells are more likely to represent initial states, etc.

Let me know whether this helps!

AlinaKurjan commented 1 year ago

Awesome, thank you for such a detailed reply and it definitely helps! :)