ventolab / CellphoneDB

CellPhoneDB can be used to search for a particular ligand/receptor, or interrogate your own HUMAN single-cell transcriptomics data.
https://www.cellphonedb.org/
MIT License
305 stars 52 forks source link

Attempted to generate a database using Arabidopsis data, but failed. Please help & thanks #112

Closed bitcometz closed 7 months ago

bitcometz commented 1 year ago

hello, I try to make a Arabidopsis db to do analysis with cellphonedb, so I first make four files:

Than I used the following commands :

cellphonedb database generate --user-interactions interaction_input.csv \
   --user-protein protein_input.csv \
   --user-gene gene_input.csv \
   --user-complex complex_input.csv \
   --user-interactions-only

but I got error log:

[ ][APP][21/04/23-16:48:11][WARNING] There are some proteins or complexes not interacting properly: `AAP3, AAP5, ABCA1, ABCB19, ABCG26, ABCG31, ABCG40, ABI1, ACBP1, ACBP3, ACBP6, ADPG1, AGAL1, AGAL2, AGB1, AGP1, AGP10, AGP11, AGP12, AGP13, AGP16, AGP17, AGP18, AGP19, AGP2, AGP20, AGP21, AGP22, AGP24, AGP25, AGP26, AGP3, AGP30, AGP31, AGP41, AGP5, AGP9, AIR3, AKINBETA1, ALA1, ALDH22A1, ALE1, ALMT1, AMT13, AMY1, ANK, ANN5, ANX1, ANX2, AO1, ARA6, ARA7, ARCK1, ARF1, ARFA1D, AT1G01980, AT1G02810, AT1G03103, AT1G03700, AT1G04300, AT1G07550, AT1G07560, AT1G07650, AT1G08230, AT1G08340, AT1G08590, AT1G09390, AT1G11050, AT1G11590, AT1G11820, AT1G12460, AT1G13230, AT1G13750, AT1G14040, AT1G14390, AT1G15190, AT1G16260, AT1G16950, AT1G17230, AT1G17910, AT1G18280, AT1G19390, AT1G22690, AT1G23140, AT1G23410, AT1G24140, AT1G25390, AT1G26390, AT1G26420, AT1G27190, AT1G29290, AT1G29670, AT1G29720, AT1G30700, AT1G30710, AT1G30760, AT1G30870, AT1G31550, AT1G31670, AT1G32090, AT1G32860, AT1G34047, AT1G34110, AT1G34300, AT1G34510, AT1G35350, ......

[ ][CORE][21/04/23-16:48:11][INFO] Initializing SqlAlchemy CellPhoneDB Core
[ ][CORE][21/04/23-16:48:11][INFO] Using custom database at path/out/cellphonedb_user_2023-04-21-16_48.db
[ ][APP][21/04/23-16:48:11][INFO] Collecting protein
[ ][CORE][21/04/23-16:48:11][INFO] Initializing SqlAlchemy CellPhoneDB Core
[ ][CORE][21/04/23-16:48:11][INFO] Using custom database at path/out/cellphonedb_user_2023-04-21-16_48.db
[ ][CORE][21/04/23-16:48:11][INFO] Collecting protein
[ ][APP][21/04/23-16:48:11][INFO] Collecting gene
[ ][CORE][21/04/23-16:48:11][INFO] Initializing SqlAlchemy CellPhoneDB Core
[ ][CORE][21/04/23-16:48:11][INFO] Using custom database at path/out/cellphonedb_user_2023-04-21-16_48.db
[ ][CORE][21/04/23-16:48:11][INFO] Collecting gene
[ ][APP][21/04/23-16:48:11][INFO] Collecting complex
[ ][CORE][21/04/23-16:48:11][INFO] Initializing SqlAlchemy CellPhoneDB Core
[ ][CORE][21/04/23-16:48:11][INFO] Using custom database at path/out/cellphonedb_user_2023-04-21-16_48.db
[ ][CORE][21/04/23-16:48:11][INFO] Collecting complex
[ ][APP][21/04/23-16:48:11][INFO] Collecting interaction
[ ][CORE][21/04/23-16:48:11][INFO] Initializing SqlAlchemy CellPhoneDB Core
[ ][CORE][21/04/23-16:48:11][INFO] Using custom database at path/out/cellphonedb_user_2023-04-21-16_48.db
[ ][CORE][21/04/23-16:48:11][INFO] Collecting interaction
Traceback (most recent call last):
  File "/conda_env/cellphonedb/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2895, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'id_cp_interaction'

my software info:

$ pip show cellphoneDB
Name: CellPhoneDB
Version: 3.0.0      (Version: 2.1.7 does not work too )
Summary: Inferring cell-cell communication
Home-page: https://cellphonedb.org
Author: TeichLab/VentoLab
Author-email: contact@cellphonedb.org
License: MIT

$ pip show pandas
Name: pandas
Version: 1.1.4
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: https://pandas.pydata.org
Author: 
Author-email: 
License: BSD

Thanks !!!

ktroule commented 1 year ago

Thanks for using CellPhoneDB.

We would recommend using the latest version of the tool. Check the main page of the repository for more information.

From your error it seems that your database is not fully consistent, i.e. some genes/proteins/interactions are not found in some of the required tables. [ ][APP][21/04/23-16:48:11][WARNING] There are some proteins or complexes not interacting properly: AAP3, AAP5, ...

I've observed an error in your interaction_input file, the partner columns should have your equivalent to the uniprot Id not the protein_name.

Here you have a minimal example generated from your data that works:

==> complex_input.csv <==
complex_name,uniprot_1,uniprot_2,uniprot_3,uniprot_4,uniprot_5,version
SBTI1.1_PSKR1,UAT1G01900,UAT2G02220,,,,new

==> gene_input.csv <==
gene_name,uniprot,hgnc_symbol,ensembl
SBTI1.1,UAT1G01900,SBTI1.1,EAT1G01900
PSKR1,UAT2G02220,PSKR1,EAT2G02220
RPK2,UAT3G02130,RPK2,EAT3G02130

==> interaction_input.csv <==
id_cp_interaction,partner_a,partner_b,protein_name_a,protein_name_b,annotation_strategy,source
CPI-SS111111112,UAT1G01900,UAT2G02220,,,curated,uniprot

==> protein_input.csv <==
uniprot,protein_name
UAT1G01900,SBTI1.1
UAT2G02220,PSKR1

Kind regards

bitcometz commented 1 year ago

@ktroule , thanks for your help !!! We will try the method you provided as soon as possible and then let you know how it goes.

Best