Should idents be broad cell types or well defined subpopulations?

kaizen89 commented 1 year ago

Hi, I was wandering what is the best way to choose the idents_col, either use broad cell type such as, Tcells, NKcells, Macrophages etc or use a greater resolution? and in the latter would this increase the computation load? Thanks!

dbdimitrov commented 1 year ago

Hi @kaizen89,

Essentially it depends on your assumptions. I would personally focus on the lowest possible resolution that is also stable across samples. This is also related to how one handles missing values.

Say you have a cell state that is highly relevant, but present present only in the disease samples. Then you would first need to ensure that it's not filtered. Second because tensor is implemented in such a way that missing values are masked in the PARAFAC decomposition step (@earmingol can confirm), it will ignore missing values (None/NULL). So, if you assign None to the mostly missing cell state then it will not contribute to the separation of the samples. Thus, you would need to assign missing values as 0s - which essentially means that you assume that the missing cell state across samples is biologically relevant.

As you can see, this is not a trivial question and it's very context dependent.

I hope this helps.

kaizen89 commented 1 year ago

Just to make sure I understand well, you do not see any advantage of using a higher resolution?

dbdimitrov commented 1 year ago

I don't see an advantage of using cell states that are unstable across the samples. Keep those identities that you think seeing a chance in across samples is meaningful. 🙂

saezlab / liana

Should idents be broad cell types or well defined subpopulations? #116