Closed kaizen89 closed 1 year ago
Hi @kaizen89,
Essentially it depends on your assumptions. I would personally focus on the lowest possible resolution that is also stable across samples. This is also related to how one handles missing values.
Say you have a cell state that is highly relevant, but present present only in the disease samples. Then you would first need to ensure that it's not filtered. Second because tensor is implemented in such a way that missing values are masked in the PARAFAC decomposition step (@earmingol can confirm), it will ignore missing values (None/NULL). So, if you assign None to the mostly missing cell state then it will not contribute to the separation of the samples. Thus, you would need to assign missing values as 0s - which essentially means that you assume that the missing cell state across samples is biologically relevant.
As you can see, this is not a trivial question and it's very context dependent.
I hope this helps.
Just to make sure I understand well, you do not see any advantage of using a higher resolution?
I don't see an advantage of using cell states that are unstable across the samples. Keep those identities that you think seeing a chance in across samples is meaningful. 🙂
Hi, I was wandering what is the best way to choose the
idents_col
, either use broad cell type such as, Tcells, NKcells, Macrophages etc or use a greater resolution? and in the latter would this increase the computation load? Thanks!