saezlab / liana

LIANA: a LIgand-receptor ANalysis frAmework
https://saezlab.github.io/liana/
GNU General Public License v3.0
169 stars 30 forks source link

Help with experimental design for comparison #97

Closed GuiSeSanz closed 1 year ago

GuiSeSanz commented 1 year ago

Thanks a lot for this really useful tool! As we are using it, for some simple analysis, when facing a more complex design some questions arose.

We are working with some oncological samples and we have 3 different controls (healthy) and 4 patient data. Each sample is classified into 13 different cellt ypes. The data from the patient are composed of cancer cells and non-cancerous cells.

We are interested in studying the different interactions:

  1. among the cancerous cells
  2. the cancerous cells and non-cancerous ones

Our main objective is to identify specific interactions in the patients. We were thinking about using the interactions from the controls as a comparison baseline in order to remove the basal interactions from the cells.

We are following the tutorial on Tensor-cell2cell Decomposition, and we have questions about the samplekey conditionkey and groupby parameters.

We thought that the samplekey should be the sample identifier (patientID and controlID), the conditionkey should be the combination of the cell type with the category of the cell (cancerous, non-cancerous and healthy) and the groupby keep it as cell type.

This model will yield predictions between healthy cells and cancerous/non-cancerous cells? Those interactions are not really possible, so we filter them? Is this experimental design correct?

dbdimitrov commented 1 year ago

Hi @GuiSeSanz,

The way you assign the sample key sounds correct. Indeed, this should ultimately represent different samples. Also, groupby (idents_col in R) should be the cell types.

The condition key should just be metadata associated with the samples alone - e.g. cancerous or healthy.

The way that Tensor decomposition works is that it's hypothesis- and design-free. So, ultimately conditions are just there for visualization purposes, while the factorization will return factors which potentially separate the conditions, if the condition label is a driver of change.

When working with Tensor, it's also assumed that most cell types are shared between the samples. So, it could tell give you interactions coming from cell types absent in one condition (e.g. malignant cells) or alternatively interactions from cell types that change in terms of CCC between conditions. This ultimately depends on how you assign the labels of your cell types and the assumptions that you make when working with cell types/states absent in one condition.

We are currently working on in-depth tutorials with liana and Tensor. I can share them once they are public.

This model will yield predictions between healthy cells and cancerous/non-cancerous cells? Those interactions are not really possible, so we filter them?

Yes, you should include only those cell types the interactions between which are plausible. Best to do so prior to inferring interactions for each sample.

Hope this helps.