ventolab / CellphoneDB

CellPhoneDB can be used to search for a particular ligand/receptor, or interrogate your own HUMAN single-cell transcriptomics data.
https://www.cellphonedb.org/
MIT License
322 stars 51 forks source link

Generating user-specific database produces psedo-interactions that are not passed in. #82

Closed macelik closed 1 year ago

macelik commented 1 year ago

Hello, I am building a custom CellPhoneDB database, but I'm encountering a problem where numerous pairs that are not present in my interactions.csv file are appearing in the generated database, and many actual pairs are missing. I have like 7000 pairs in my interaction file.

The code I am using to generate is here:

cellphonedb database generate --user-interactions combined_interactions.csv --user-interactions-only --user-protein prot_user.csv --user-gene gene_user.csv --result-path combined8

I have an interaction table which looks like below:

partner_a partner_b protein_name_a protein_name_b annotation_strategy source
A0A1B0GTJ6 P22888 LOC101059948 LHCGR user_curated user
A4D1S0 O96014 WNT11 KLRG2 user_curated user
A4D1S0 P41273 TNFSF9 KLRG2 user_curated user

my protein.csv file looks like below

uniprot protein_name
P01023 A2M_HUMAN
Q07954 LRP1_HUMAN
Q16613 AANAT_HUMAN

and gene table: If an ID did not have a corresponding ensemblID, I just make up one to keep it. such as ENSG0000000001, ENSG0000000002 etc

gene_name uniprot hgnc_symbol ensembl
A2M P01023 A2M ENSG00000175899
LRP1 Q07954 LRP1 ENSG00000123384
AANAT Q16613 AANAT ENSG00000129673
macelik commented 1 year ago

Hi again, so it turns out, this issue occurs when both the protein and ligand are either ligands or receptors.

Is there any way to force CellPhoneDB to keep the edges in my input list as they are, without reversing them? I have already tried specifying the edges in my input file in a specific order (e.g., protein A first, protein B second), but this does not seem to prevent the edges from being reversed.

luzgaral commented 1 year ago

Hi,

Apologies but since CellphoneDB does not use interaction directionality (-> or <-) and the order of the proteins does not alter the results, CellphoneDB does not force order preservation when processing the user input database.

We understand that you would always like to see ligand as proteinA and receptor as proteinB, but rest assured that this will not impact the results as all the possible combinations are always tested (See the image here).

Hope this helps,

Luz