ventolab / CellphoneDB

CellPhoneDB can be used to search for a particular ligand/receptor, or interrogate your own HUMAN single-cell transcriptomics data.
https://www.cellphonedb.org/
MIT License
304 stars 52 forks source link

"Some cells in meta did not exist in counts" error #157

Closed ndrubins closed 7 months ago

ndrubins commented 7 months ago

Hi,

I'm trying to run the statistical analysis method on my n_obs × n_vars = 130646 × 17071 log-normalized counts data and I'm getting the error below:

>>> import pandas as pd
>>> import anndata
>>> import pickle
>>> from cellphonedb.src.core.methods import cpdb_statistical_analysis_method
>>> cpdb_file = '/home/rd/cpdb_dbs/v5.0.0/cellphonedb.zip'
>>> meta_file = '/home/rd/project_10_2023/cellPhoneDB_statistical/obs.tsv'
>>> counts_file = '/home/rd/project_10_2023/cellPhoneDB_statistical/counts.h5ad'
>>> microenvs_file = '/home/rd/project_10_2023/cellPhoneDB_statistical/microenvironment.tsv'
>>> out_dir = '/home/rd/project_10_2023/cellPhoneDB_statistical'

>>> metadata = pd.read_csv(meta_file, sep = '\t')
>>> adata = anndata.read_h5ad(counts_file)
>>> list(adata.obs.index).sort() == list(metadata['barcode_sample']).sort()
True

>>> cpdb_results = cpdb_statistical_analysis_method.call(
...   cpdb_file_path = cpdb_file,
...   meta_file_path = meta_file,
...   counts_file_path = counts_file,
...   counts_data = 'gene_name',
...   microenvs_file_path = microenvs_file,
...   score_interactions = True,
...   iterations = 1000,
...   threshold = 0.1,
...   threads = 5,
...   debug_seed = 42,
...   result_precision = 3,
...   pvalue = 0.05,
...   subsampling = False,
...   subsampling_log = False,
...   subsampling_num_pc = 100,
...   subsampling_num_cells = 1000,
...   separator = '|',
...   debug = False,
...   output_path = out_dir,
...   output_suffix = None
...   )
Reading user files...
The following user files were loaded successfully:
/home/rd/project_10_2023/cellPhoneDB_statistical/counts.h5ad
/home/rd/project_10_2023/cellPhoneDB_statistical/obs.tsv
/home/rd/project_10_2023/cellPhoneDB_statistical/microenvironment.tsv
[ ][CORE][24/11/23-11:34:03][INFO] [Cluster Statistical Analysis] Threshold:0.1 Iterations:1000 Debug-seed:42 Threads:5 Precision:3
[ ][CORE][24/11/23-11:34:03][WARNING] Debug random seed enabled. Set to 42
[ ][CORE][24/11/23-11:34:09][INFO] No CellphoneDB interactions found in this input.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rd/miniconda/envs/cpdb/lib/python3.9/site-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_method.py", line 148, in call
    significant_means = analysis_result['significant_means']
KeyError: 'significant_means'

My adata:

>>> adata
AnnData object with n_obs × n_vars = 130646 × 17071
    obs: 'barcode_sample', 'cell_type'
    var: 'gene_name'
>>> adata.X
<130646x17071 sparse matrix of type '<class 'numpy.float64'>'
    with 262689547 stored elements in Compressed Sparse Column format>
>>> adata.obs
                                                barcode_sample              cell_type
AAACCTGAGCTAAACA_yt_1        AAACCTGAGCTAAACA_young.tr_1                   ILC2
AAACCTGAGGAGCGAG_yt_1        AAACCTGAGGAGCGAG_young.tr_1  mac_1
AAACCTGCAATAACGA_yt_1        AAACCTGCAATAACGA_young.tr_1              cDC1
AAACCTGCACCTGGTG_yt_1        AAACCTGCACCTGGTG_young.tr_1  mac_1
AAACCTGCAGTTAACC_yt_1        AAACCTGCAGTTAACC_young.tr_1                 B_cell
...                                                        ...                    ...
TTTGTCAGTGTTTGTG_on_4  TTTGTCAGTGTTTGTG_CD45neg.old_4  fib_1
TTTGTCATCCTAAGTG_on_4  TTTGTCATCCTAAGTG_CD45neg.old_4  fib_1
TTTGTCATCGCATGGC_on_4  TTTGTCATCGCATGGC_CD45neg.old_4      cap
TTTGTCATCGGCGGTT_on_4  TTTGTCATCGGCGGTT_CD45neg.old_4                   vein
TTTGTCATCTGCTTGC_on_4  TTTGTCATCTGCTTGC_CD45neg.old_4             cil_1
>>> adata.var
                   gene_name
0610009B22Rik  0610009B22Rik
0610009E02Rik  0610009E02Rik
0610009L18Rik  0610009L18Rik
0610010F05Rik  0610010F05Rik
0610010K14Rik  0610010K14Rik
...                      ...
Zxdc                    Zxdc
Zyg11b                Zyg11b
Zyx                      Zyx
Zzef1                  Zzef1
Zzz3                    Zzz3
>>> metadata
                        barcode_sample              cell_type
0          AAACCTGAGCTAAACA_yt_1                   ILC2
1          AAACCTGAGGAGCGAG_yt_1  mac_1
2          AAACCTGCAATAACGA_yt_1              cDC1
3          AAACCTGCACCTGGTG_yt_1  mac_1
4          AAACCTGCAGTTAACC_yt_1                 B_cell
...                                ...                    ...
130641  TTTGTCAGTGTTTGTG_on_4  fib_1
130642  TTTGTCATCCTAAGTG_on_4  fib_1
130643  TTTGTCATCGCATGGC_on_4      cap
130644  TTTGTCATCGGCGGTT_on_4                   vein
130645  TTTGTCATCTGCTTGC_on_4             cil_1

[130646 rows x 2 columns]

No output files are produced.

Any idea?

datasome commented 7 months ago

Hi ndrubins,

Thank you for using CellphoneDB. It looks as if you're using mouse gene names, but CellphoneDB works with human genes only. Please see the advice in 'Counts file' section of https://cellphonedb.readthedocs.io/en/latest/RESULTS-DOCUMENTATION.html.

Best,

Robert.

ndrubins commented 7 months ago

Sorry I missed that.

Thanks.