liana_tensor_c2c error: All input columns must be contained in all dataframes included in 'context_dict'

alvarezprado commented 5 months ago

Hi,

Thanks for developing LIANA, it's a truly wonderful tool. I'm trying to run tensor decomposition on my data to compare cell-cell communication between two conditions (WT and KO for our gene of interest) following the documentation but I get an error I don't know how to interpret when I run liana_tensor_c2c function:

Error in py_call_impl(callable, call_args$unnamed, call_args$named) : AssertionError: All input columns must be contained in all dataframes included in 'context_dict

This is the output of reticulate::py_last_error()$r_trace$full_call

Python Exception Message

Traceback (most recent call last): File "/Users/angel/Library/Caches/org.R-project.R/R/basilisk/1.15.2004/liana/0.1.13/liana_cell2cell/lib/python3.8/site-packages/cell2cell/tensor/external_scores.py", line 117, in dataframes_to_tensor assert all([c in df.columns for c in cols for df in context_df_dict.values()]), "All input columns must be contained in all dataframes included in 'context_dict'" AssertionError: All input columns must be contained in all dataframes included in 'context_dict'

R Traceback `▆

└─liana::liana_tensor_c2c(...)
└─c2c$tensor$dataframes_to_tensor(...)
└─reticulate:::py_call_impl(callable, call_args$unnamed, call_args$named)`

Do you know what might be the problem here? Any input will be much appreciated.

Thank you!

dbdimitrov commented 5 months ago

@earmingol this seems like a custom exception. Any suggestions? @alvarezprado I assume it's something to do with the naming of the columns that you're passing? Maybe the scores that you use?

Is this the tutorial that you're running: https://ccc-protocols.readthedocs.io/en/latest/notebooks/ccc_R/QuickStart.html

earmingol commented 5 months ago

Yes, the issue is related to the column names passed as input, one or more of them does not match the column names in the dataframes.

alvarezprado commented 5 months ago

Thanks a lot @dbdimitrov and @earmingol for the fast replies. The tutorial I'm following is this one: https://saezlab.github.io/liana/articles/liana_cc2tensor.html

Maybe the discrepancy between the names comes because I changed the identities of the Seurat object (RenameIdents function) to annotate clusters before converting it to a single cell experiment object? I'm not sure which dataframes I should check, I'm enclosing below a subset of the sce object to make things easier.

Thank you! alvarezprado_sce_liana.rds.zip

earmingol commented 5 months ago

Hmm, everything seems right, maybe try passing columns explicitly:

sce <- liana_tensor_c2c(sce = sce,
                        score_col = "LRscore",
                        sender_col = "source", # Just added this
                        receiver_col = "target", # Just added this
                        ligand_col = "ligand.complex", # Just added this
                        receptor_col = "receptor.complex", # Just added this
                        rank = 7,  # set to None to estimate for you data!
                        how="outer",  #  defines how the tensor is built
                        conda_env = NULL, # used to pass an existing conda env with cell2cell
                        use_available = FALSE # detect & load cell2cell if available
                        )

The only weird thing to me in your object is when inspecting the ligand and receptor columns (ligand.complex and receptor.complex as well) and you have names like "F10", "F3", and "F7". Maybe there is something with the gene names?

Anyways, I recommend giving a try to this tutorial instead:

https://ccc-protocols.readthedocs.io/en/latest/notebooks/ccc_R/QuickStart.html

And install the tools as indicated here:

https://github.com/saezlab/ccc_protocols/tree/main/env_setup

alvarezprado commented 5 months ago

Thanks @earmingol!! Indeed, you spotted a big mistake on my side, I forgot to pass my custom resource (mouse genes) to the liana_bysample function which provoked L-R pairs to be restricted to F genes. I ran the pipeline again, now using the appropriate resource and I'm getting a different error:

Error in py_call_impl(callable, call_args$unnamed, call_args$named) : 
  ValueError: negative dimensions are not allowed

I will start over again following the tutorial you recommended and see if I the error still happens.

Thank you!

earmingol commented 5 months ago

Not sure what this error could be about, exactly what code generated it?

I suspect it could be due to negative values in your scores? Did you normalize/scale your expression values? If so it could be due to zero-centering your values so you get negative normalized expression for some genes. Make sure that all expression values are non negative.

Maybe @dbdimitrov could have more insights about this

dbdimitrov commented 5 months ago

Hi @alvarezprado, it could be that the resource is not formatted correctly and somehow it seems to affect the dimensions of your data?

You could also use liana's internal mouse resource by passing "MouseConsensus" to the resource parameter.

alvarezprado commented 5 months ago

Thanks @earmingol and @dbdimitrov, I started from scratch following your advice and now everything works fine, I think the last error was probably related to wrong formatting.

I'm closing this issue, thank you once again for your help and for this fantastic tool!

saezlab / liana

liana_tensor_c2c error: All input columns must be contained in all dataframes included in 'context_dict' #149