ventolab / CellphoneDB

CellPhoneDB can be used to search for a particular ligand/receptor, or interrogate your own HUMAN single-cell transcriptomics data.
https://www.cellphonedb.org/
MIT License
304 stars 52 forks source link

No CellphoneDB interactions found in this input #195

Open fafa92 opened 2 weeks ago

fafa92 commented 2 weeks ago

Hi,

Thanks for the great package. I'm trying to use cellphone db v5 to find a connection in my dataset. I'm using this code with the dataset that I provided in the link below. I'm getting "No CellphoneDB interactions found in this input" as the output. My input is already log-transformed. Any help would be appreciated.

counts: https://drive.google.com/file/d/1YE-G2LF05uCb7-Cl0IW-9kiKeaj4VWuB/view?usp=sharing metadata: https://drive.google.com/file/d/1GTsiAz9CcJKa1WhQ4ZvQ8wGCxmCaQ3Ml/view?usp=sharing

from cellphonedb.src.core.methods import cpdb_statistical_analysis_method

# Define file paths
cpdb_file_path = '/content/cellphonedb.zip'
meta_file_path = '/content/metadata.txt'
counts_file_path = '/content/counts.txt'
output_path = 'cellphonedb_results'

# Run CellPhoneDB statistical analysis
cpdb_results = cpdb_statistical_analysis_method.call(
    cpdb_file_path=cpdb_file_path,
    meta_file_path=meta_file_path,
    counts_file_path=counts_file_path,
    counts_data='ensembl',
    threshold=0.1,
    output_path=output_path,
    subsampling_num_cells=1000
)

log output:

Reading user files...
The following user files were loaded successfully:
/content/counts.txt
/content/metadata.txt
[ ][CORE][14/06/24-04:51:21][INFO] [Cluster Statistical Analysis] Threshold:0.1 Iterations:1000 Debug-seed:-1 Threads:4 Precision:3
[ ][CORE][14/06/24-04:51:21][INFO] No CellphoneDB interactions found in this input.
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
[<ipython-input-11-2b6c25d33580>](https://localhost:8080/#) in <cell line: 10>()
      8 
      9 # Run CellPhoneDB statistical analysis
---> 10 cpdb_results = cpdb_statistical_analysis_method.call(
     11     cpdb_file_path=cpdb_file_path,
     12     meta_file_path=meta_file_path,

[/usr/local/lib/python3.10/dist-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_method.py](https://localhost:8080/#) in call(cpdb_file_path, meta_file_path, counts_file_path, counts_data, output_path, microenvs_file_path, active_tfs_file_path, iterations, threshold, threads, debug_seed, result_precision, pvalue, subsampling, subsampling_log, subsampling_num_pc, subsampling_num_cells, separator, debug, output_suffix, score_interactions)
    146                                                                     )
    147 
--> 148     significant_means = analysis_result['significant_means']
    149     max_rank = significant_means['rank'].max()
    150     significant_means['rank'] = significant_means['rank'].apply(lambda rank: rank if rank != 0 else (1 + max_rank))

KeyError: 'significant_means'
cakirb commented 2 weeks ago

Hi @fafa92,

I've reviewed your files and noticed that the counts are not in the format required by CellPhoneDB. Please ensure that any normalisation procedure you use does not transform zeros into any other value. Therefore, you should apply log-normalisation only, without z-scaling.

Hope this helps!

Best, Batu

fafa92 commented 2 weeks ago

Hi @cakirb,

I appreciate your suggestion. That solved the problem. I have another question regarding my dataset. my ann data object is 372081 (cells) × 3000 (genes). When I run it with the command below, it takes up all my memory (about 300 GB) until it crashes.

from cellphonedb.src.core.methods import cpdb_statistical_analysis_method


# Define file paths
cpdb_file_path = '/content/cellphonedb.zip'
meta_file_path = '/content/metadata.txt'
counts_file_path = '/content/counts.txt'
output_path = 'cellphonedb_results'

# Run CellPhoneDB statistical analysis
cpdb_results = cpdb_statistical_analysis_method.call(
    cpdb_file_path=cpdb_file_path,
    meta_file_path=meta_file_path,
    counts_file_path=counts_file_path,
    counts_data='ensembl',
    threshold=0.1,
    output_path=output_path,
    subsampling_num_cells=100)

When I lowered the number of genes to about 300, it worked but didn't show any significant connection. Is there any way around this problem? Can we run it with as many genes as possible without crashing the system?