KeyError: 'FN1_integrin_a5b1_complex'

Pedramto89 commented 4 months ago

I run V5 of CPDB but I got this error:

cpdb_file_path = '/Users/pedram/opt/anaconda3/envs/cpdb/out/v5.0.0/cellphonedb.zip'
meta_file_path = '/Users/pedram/opt/anaconda3/envs/cpdb/test_meta.txt'
counts_file_path = '/Users/pedram/opt/anaconda3/envs/cpdb/test_counts.txt'
out_path = '/Users/pedram/opt/anaconda3/envs/cpdb/out'

from cellphonedb.src.core.methods import cpdb_statistical_analysis_method

cpdb_results = cpdb_statistical_analysis_method.call(
    cpdb_file_path = cpdb_file_path,                 # mandatory: CellphoneDB database zip file.
    meta_file_path = meta_file_path,                 # mandatory: tsv file defining barcodes to cell label.
    counts_file_path = counts_file_path,             # mandatory: normalized count matrix - a path to the counts file, or an in-memory AnnData object
    counts_data = 'hgnc_symbol',                     # defines the gene annotation in counts matrix.
    score_interactions = True,                       # optional: whether to score interactions or not. 
    iterations = 1000,                               # denotes the number of shufflings performed in the analysis.
    threshold = 0.1,                                 # defines the min % of cells expressing a gene for this to be employed in the analysis.
    threads = 6,                                     # number of threads to use in the analysis.
    debug_seed = 42,                                 # debug randome seed. To disable >=0.
    result_precision = 3,                            # Sets the rounding for the mean values in significan_means.
    pvalue = 0.05,                                   # P-value threshold to employ for significance.
    separator = '|',                                 # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".
    debug = False,                                   # Saves all intermediate tables employed during the analysis in pkl format.
    output_path = out_path,                          # Path to save results.
    output_suffix = None                             # Replaces the timestamp in the output files by a user defined string in the  (default: None).
    )

Reading user files...
The following user files were loaded successfully:
/Users/pedram/opt/anaconda3/envs/cpdb/test_counts.txt
/Users/pedram/opt/anaconda3/envs/cpdb/test_meta.txt
[ ][CORE][04/03/24-23:15:50][INFO] [Cluster Statistical Analysis] Threshold:0.1 Iterations:1000 Debug-seed:42 Threads:6 Precision:3
[ ][CORE][04/03/24-23:15:50][WARNING] Debug random seed enabled. Set to 42
[ ][CORE][04/03/24-23:15:50][INFO] Running Real Analysis
[ ][CORE][04/03/24-23:15:50][INFO] Running Statistical Analysis
100%|███████████████████████████████████████| 1000/1000 [01:11<00:00, 13.90it/s]
[ ][CORE][04/03/24-23:17:02][INFO] Building Pvalues result
[ ][CORE][04/03/24-23:17:02][INFO] Building results

[ ][CORE][04/03/24-23:17:03][INFO] Scoring interactions: Filtering genes per cell type..
100%|██████████████████████████████████████████| 12/12 [00:00<00:00, 133.50it/s]
[ ][CORE][04/03/24-23:17:03][INFO] Scoring interactions: Calculating mean expression of each gene per group/cell type..

100%|██████████████████████████████████████████| 12/12 [00:00<00:00, 300.16it/s]
[ ][CORE][04/03/24-23:17:03][INFO] Scoring interactions: Calculating scores for all interactions and cell types..
/Users/pedram/opt/anaconda3/envs/CPDB-March2024/lib/python3.9/site-packages/cellphonedb/utils/scoring_utils.py:103: RuntimeWarning: invalid value encountered in power
  geom = np.power(sub_prod, 1 / len(sub_values))
/Users/pedram/opt/anaconda3/envs/CPDB-March2024/lib/python3.9/site-packages/cellphonedb/utils/scoring_utils.py:103: RuntimeWarning: invalid value encountered in power
  geom = np.power(sub_prod, 1 / len(sub_values))
/Users/pedram/opt/anaconda3/envs/CPDB-March2024/lib/python3.9/site-packages/cellphonedb/utils/scoring_utils.py:103: RuntimeWarning: invalid value encountered in power
  geom = np.power(sub_prod, 1 / len(sub_values))
/Users/pedram/opt/anaconda3/envs/CPDB-March2024/lib/python3.9/site-packages/cellphonedb/utils/scoring_utils.py:103: RuntimeWarning: invalid value encountered in power
  geom = np.power(sub_prod, 1 / len(sub_values))
/Users/pedram/opt/anaconda3/envs/CPDB-March2024/lib/python3.9/site-packages/cellphonedb/utils/scoring_utils.py:103: RuntimeWarning: invalid value encountered in power
  geom = np.power(sub_prod, 1 / len(sub_values))
/Users/pedram/opt/anaconda3/envs/CPDB-March2024/lib/python3.9/site-packages/cellphonedb/utils/scoring_utils.py:103: RuntimeWarning: invalid value encountered in power
  geom = np.power(sub_prod, 1 / len(sub_values))
/Users/pedram/opt/anaconda3/envs/CPDB-March2024/lib/python3.9/site-packages/cellphonedb/utils/scoring_utils.py:103: RuntimeWarning: invalid value encountered in power
  geom = np.power(sub_prod, 1 / len(sub_values))
/Users/pedram/opt/anaconda3/envs/CPDB-March2024/lib/python3.9/site-packages/cellphonedb/utils/scoring_utils.py:103: RuntimeWarning: invalid value encountered in power
  geom = np.power(sub_prod, 1 / len(sub_values))
/Users/pedram/opt/anaconda3/envs/CPDB-March2024/lib/python3.9/site-packages/cellphonedb/utils/scoring_utils.py:103: RuntimeWarning: invalid value encountered in power
  geom = np.power(sub_prod, 1 / len(sub_values))
/Users/pedram/opt/anaconda3/envs/CPDB-March2024/lib/python3.9/site-packages/cellphonedb/utils/scoring_utils.py:103: RuntimeWarning: invalid value encountered in power
  geom = np.power(sub_prod, 1 / len(sub_values))
100%|█████████████████████████████████████████| 144/144 [00:03<00:00, 44.14it/s]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[8], line 3
      1 from cellphonedb.src.core.methods import cpdb_statistical_analysis_method
----> 3 cpdb_results = cpdb_statistical_analysis_method.call(
      4     cpdb_file_path = cpdb_file_path,                 # mandatory: CellphoneDB database zip file.
      5     meta_file_path = meta_file_path,                 # mandatory: tsv file defining barcodes to cell label.
      6     counts_file_path = counts_file_path,             # mandatory: normalized count matrix - a path to the counts file, or an in-memory AnnData object
      7     counts_data = 'hgnc_symbol',                     # defines the gene annotation in counts matrix.
      8     score_interactions = True,                       # optional: whether to score interactions or not. 
      9     iterations = 1000,                               # denotes the number of shufflings performed in the analysis.
     10     threshold = 0.1,                                 # defines the min % of cells expressing a gene for this to be employed in the analysis.
     11     threads = 6,                                     # number of threads to use in the analysis.
     12     debug_seed = 42,                                 # debug randome seed. To disable >=0.
     13     result_precision = 3,                            # Sets the rounding for the mean values in significan_means.
     14     pvalue = 0.05,                                   # P-value threshold to employ for significance.
     15     separator = '|',                                 # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".
     16     debug = False,                                   # Saves all intermediate tables employed during the analysis in pkl format.
     17     output_path = out_path,                          # Path to save results.
     18     output_suffix = None                             # Replaces the timestamp in the output files by a user defined string in the  (default: None).
     19     )

File ~/opt/anaconda3/envs/CPDB-March2024/lib/python3.9/site-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_method.py:157, in call(cpdb_file_path, meta_file_path, counts_file_path, counts_data, output_path, microenvs_file_path, active_tfs_file_path, iterations, threshold, threads, debug_seed, result_precision, pvalue, subsampling, subsampling_log, subsampling_num_pc, subsampling_num_cells, separator, debug, output_suffix, score_interactions)
    154 if score_interactions:
    155     # Make sure all cell types are strings
    156     meta['cell_type'] = meta['cell_type'].apply(str)
--> 157     interaction_scores = scoring_utils.score_interactions_based_on_participant_expressions_product(
    158         cpdb_file_path, counts4scoring, means_result.copy(), separator, meta, threshold, "cell_type", threads)
    159     analysis_result['interaction_scores'] = interaction_scores
    161 file_utils.save_dfs_as_tsv(output_path, output_suffix, "statistical_analysis", analysis_result)

File ~/opt/anaconda3/envs/CPDB-March2024/lib/python3.9/site-packages/cellphonedb/utils/scoring_utils.py:344, in score_interactions_based_on_participant_expressions_product(cpdb_file_path, counts, means, separator, metadata, threshold, cell_type_col_name, threads)
    340 cpdb_fms = scale_expression(cpdb_fmsh,
    341                             upper_range=10)
    343 # Step 5: calculate the ligand-receptor score.
--> 344 interaction_scores = score_product(matrix=cpdb_fms,
    345                                    means=means,
    346                                    separator=separator,
    347                                    interactions=interactions,
    348                                    id2name=id2name,
    349                                    threads=threads)
    350 return interaction_scores

File ~/opt/anaconda3/envs/CPDB-March2024/lib/python3.9/site-packages/cellphonedb/utils/scoring_utils.py:290, in score_product(matrix, interactions, means, separator, id2name, threads)
    288 for ct_pair, lr_scores_filtered in results:
    289     interacting_pair2score = dict(zip(lr_scores_filtered['interacting_pair'], lr_scores_filtered['score']))
--> 290     interaction_scores[ct_pair] = [interacting_pair2score[id] for id in interaction_scores['interacting_pair']]
    292 return interaction_scores

File ~/opt/anaconda3/envs/CPDB-March2024/lib/python3.9/site-packages/cellphonedb/utils/scoring_utils.py:290, in <listcomp>(.0)
    288 for ct_pair, lr_scores_filtered in results:
    289     interacting_pair2score = dict(zip(lr_scores_filtered['interacting_pair'], lr_scores_filtered['score']))
--> 290     interaction_scores[ct_pair] = [interacting_pair2score[id] for id in interaction_scores['interacting_pair']]
    292 return interaction_scores

KeyError: 'FN1_integrin_a5b1_complex'

cakirb commented 4 months ago

Hi @Pedramto89,

To be able to debug the issue, could you send the input files you are using to contact@cellphonedb.org? If the files are too big to share via email, you can also send us the link to access them.

Best, Batu

cakirb commented 3 months ago

Hi @Pedramto89,

Thanks for sharing your input files to contact@cellphonedb.org. We have found from your input file that z-scaling was applied to your counts. However, z-scaling must be avoided if you aim to score interactions. You can check the related section in CellPhoneDB documentation. Therefore, you should use log-normalised expression data for scoring interactions.

Hope this helps you!

Best, Batu

RenhaoL commented 3 months ago

I got a similar issue with log-normalized data. Any idea on how to solve that? Thanks!

cakirb commented 3 months ago

Hi @RenhaoL,

Even your data is log-normalised, you should make sure that your data doesn't have z-scaling applied as I explained in the previous comment. This means your data must consist of some zeros and positive float numbers and it must not have any negative values.

If your data has this structure and it still gives a similar error, you can send your input files to contact@cellphonedb.org, and we can check if there is any problem in your data or any bug in CellPhoneDB.

Best, Batu

RenhaoL commented 3 months ago

Thank you. Restart the environment fixed my problem... Not sure what went wrong.

cakirb commented 1 month ago

Hello,

I'm closing this issue since the problem has been resolved. Feel free to reopen it if the same problem persists.

Best, Batu

ventolab / CellphoneDB

KeyError: 'FN1_integrin_a5b1_complex' #176