ventolab / CellphoneDB

CellPhoneDB can be used to search for a particular ligand/receptor, or interrogate your own HUMAN single-cell transcriptomics data.
https://www.cellphonedb.org/
MIT License
304 stars 52 forks source link

KeyError: 'significant_means' #177

Open dhairya02 opened 3 months ago

dhairya02 commented 3 months ago

Hi I am running cpdb_statistical_analysis_method of cellphone db. My anndata shape is (6890, 2000) with following parameters:

cpdb_file_path = 'Resources/cellphonedb.zip'
    meta_file_path = f'Data/{subsample}/CPDB_data/metadata.txt'
    counts_file_path = f'Data/{subsample}/CPDB_data/counts.h5ad'
    out_path = f'Data/{subsample}/CPDB_results/'

    os.makedirs(out_path, exist_ok=True)
    metadata = pd.read_csv(meta_file_path, sep = '\t')

    cpdb_results = cpdb_statistical_analysis_method.call(
        cpdb_file_path = cpdb_file_path,                 # mandatory: CellPhoneDB database zip file.
        meta_file_path = meta_file_path,                 # mandatory: tsv file defining barcodes to cell label.
        counts_file_path = counts_file_path,             # mandatory: normalized count matrix.
        counts_data = 'hgnc_symbol',                     # defines the gene annotation in counts matrix.
        iterations = 1000,                               # denotes the number of shufflings performed in the analysis.
        threshold = 0.1,                                 # defines the min % of cells expressing a gene for this to be employed in the analysis.
        threads = 40,                                    # number of threads to use in the analysis.
        debug_seed = 42,                                 # debug randome seed. To disable >=0.
        result_precision = 3,                            # Sets the rounding for the mean values in significan_means.
        pvalue = 0.05,                                   # P-value threshold to employ for significance.
        subsampling = False,                             # To enable subsampling the data (geometri sketching).
        subsampling_log = False,                         # (mandatory) enable subsampling log1p for non log-transformed data inputs.
        subsampling_num_pc = 100,                        # Number of componets to subsample via geometric skectching (dafault: 100).
        subsampling_num_cells = 1000,                    # Number of cells to subsample (integer) (default: 1/3 of the dataset).
        separator = '|',                                 # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".
        debug = False,                                   # Saves all intermediate tables employed during the analysis in pkl format.
        output_path = out_path,                          # Path to save results.
        output_suffix = subsample                        # Replaces the timestamp in the output files by a user defined string in the  (default: None).
    )
I am getting the following error:
Reading user files...
The following user files were loaded successfully:
Data/Control4003/CPDB_data/counts.h5ad
Data/Control4003/CPDB_data/metadata.txt
[ ][CORE][23/03/24-20:24:37][INFO] [Cluster Statistical Analysis] Threshold:0.1 Iterations:1000 Debug-seed:42 Threads:40 Precision:3
[ ][CORE][23/03/24-20:24:37][WARNING] Debug random seed enabled. Set to 42
[ ][CORE][23/03/24-20:24:37][INFO] No CellphoneDB interactions found in this input.
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[6], line 11
      8 os.makedirs(out_path, exist_ok=True)
      9 metadata = pd.read_csv(meta_file_path, sep = '\t')
---> 11 cpdb_results = cpdb_statistical_analysis_method.call(
     12     cpdb_file_path = cpdb_file_path,                 # mandatory: CellPhoneDB database zip file.
     13     meta_file_path = meta_file_path,                 # mandatory: tsv file defining barcodes to cell label.
     14     counts_file_path = counts_file_path,             # mandatory: normalized count matrix.
     15     counts_data = 'hgnc_symbol',                     # defines the gene annotation in counts matrix.
     16     iterations = 1000,                               # denotes the number of shufflings performed in the analysis.
     17     threshold = 0.1,                                 # defines the min % of cells expressing a gene for this to be employed in the analysis.
     18     threads = 40,                                    # number of threads to use in the analysis.
     19     debug_seed = 42,                                 # debug randome seed. To disable >=0.
     20     result_precision = 3,                            # Sets the rounding for the mean values in significan_means.
     21     pvalue = 0.05,                                   # P-value threshold to employ for significance.
     22     subsampling = False,                             # To enable subsampling the data (geometri sketching).
     23     subsampling_log = False,                         # (mandatory) enable subsampling log1p for non log-transformed data inputs.
     24     subsampling_num_pc = 100,                        # Number of componets to subsample via geometric skectching (dafault: 100).
     25     subsampling_num_cells = 1000,                    # Number of cells to subsample (integer) (default: 1/3 of the dataset).
     26     separator = '|',                                 # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".
     27     debug = False,                                   # Saves all intermediate tables employed during the analysis in pkl format.
     28     output_path = out_path,                          # Path to save results.
     29     output_suffix = subsample                        # Replaces the timestamp in the output files by a user defined string in the  (default: None).
     30 )

File /gpfs/share/apps/anaconda3/gpu/5.2.0/envs/conda_tsirigoslab_transloc_env/lib/python3.8/site-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_method.py:148, in call(cpdb_file_path, meta_file_path, counts_file_path, counts_data, output_path, microenvs_file_path, active_tfs_file_path, iterations, threshold, threads, debug_seed, result_precision, pvalue, subsampling, subsampling_log, subsampling_num_pc, subsampling_num_cells, separator, debug, output_suffix, score_interactions)
    124     counts = ss.subsample(counts)
    126 analysis_result = cpdb_statistical_analysis_complex_method.call(meta.copy(),
    127                                                                 counts,
    128                                                                 counts_relations,
   (...)
    145                                                                 output_path
    146                                                                 )
--> 148 significant_means = analysis_result['significant_means']
    149 max_rank = significant_means['rank'].max()
    150 significant_means['rank'] = significant_means['rank'].apply(lambda rank: rank if rank != 0 else (1 + max_rank))

KeyError: 'significant_means'

Can you please help as to what it means?

cakirb commented 3 months ago

Hi @dhairya02,

To be able to debug the issue, could you send the input files you are using to contact@cellphonedb.org? If the files are too big to share via email, you can also send us the link to access them.

Best, Batu

cakirb commented 1 month ago

Hi @dhairya02,

Sorry we couldn't help you since we haven't received your inputs. However, as mentioned in #186 with the same reported error, it's possible that your analysis ends up with finding no CellPhoneDB interactions, and this could be related that you might be using genes from a different organism, not human. If this is the case, you should convert the genes to their corresponding human orthologues. You can check details in our documentation: https://cellphonedb.readthedocs.io/en/latest/RESULTS-DOCUMENTATION.html#counts-file

Hope this helps!

Best, Batu