ventolab / CellphoneDB

CellPhoneDB can be used to search for a particular ligand/receptor, or interrogate your own HUMAN single-cell transcriptomics data.
MIT License
304 stars 52 forks source link

All counts filtered #171

Open sierranishizaki opened 4 months ago

sierranishizaki commented 4 months ago

Is there a reason y'all can think of that I would be getting an 'all counts filtered' error when using human genes?

I started with a mouse Seurat object and converted into a h5ad object with human genes using the orthogene package. When running my whole ~42k cell dataset I got "No Interactions Found". I saw in another post that that error may be caused by too many cells, so now I've subset my data to 28k cells and am getting this error message:

[ ][CORE][07/02/24-16:33:59][INFO] [Non Statistical Method] Threshold:0.1 Precision:3
Reading user files...
The following user files were loaded successfully:
AllCountsFilteredException                Traceback (most recent call last)
Cell In[30], line 3
      1 from cellphonedb.src.core.methods import cpdb_analysis_method
----> 3 cpdb_results =
      4     cpdb_file_path = cpdb_file_path,           # mandatory: CellphoneDB database zip file.
      5     meta_file_path = meta_file_path,           # mandatory: tsv file defining barcodes to cell label.
      6     counts_file_path = counts_file_path,       # mandatory: normalized count matrix - a path to the counts file, or an in-memory AnnData object
      7     counts_data = 'hgnc_symbol',               # defines the gene annotation in counts matrix.
      8     microenvs_file_path = None,#microenvs_file_path, # optional (default: None): defines cells per microenvironment.
      9     score_interactions = True,                 # optional: whether to score interactions or not. 
     10     output_path = out_path,                    # Path to save results    microenvs_file_path = None,
     11     separator = '|',                           # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".
     12     threads = 5,                               # number of threads to use in the analysis.
     13     threshold = 0.1,                           # defines the min % of cells expressing a gene for this to be employed in the analysis.
     14     result_precision = 3,                      # Sets the rounding for the mean values in significan_means.
     15     debug = False,                             # Saves all intermediate tables emplyed during the analysis in pkl format.
     16     output_suffix = None                       # Replaces the timestamp in the output files by a user defined string in the  (default: None)
     17 )

File ~/Desktop/CellphoneDB/cellphonedb/src/core/methods/, in call(cpdb_file_path, meta_file_path, counts_file_path, counts_data, output_path, microenvs_file_path, separator, threshold, result_precision, debug, output_suffix, score_interactions, threads)
     96 counts, counts_relations = cpdb_statistical_analysis_helper.add_multidata_and_means_to_counts(
     97     counts, genes, counts_data)
     98 if counts.empty:
---> 99     raise AllCountsFilteredException(hint='Are you using human data?')
    101 interactions_filtered, counts_filtered, complex_composition_filtered = \
    102     cpdb_statistical_analysis_helper.prefilters(interactions_reduced,
    103                                                 counts,
    104                                                 complexes,
    105                                                 complex_compositions)
    106 if interactions_filtered.empty:

AllCountsFilteredException: All counts filtered

Double checking, my gene names do appear to be human (filtered for only 1-to-1 genes).

> adata.var
0   NaN
1   HBA2
2   NaN
3   HBA2
4   NaN
... ...
1995    CLNS1A
1996    PSMA3
1997    PTBP1
1998    SLC22A23
1999    TOMM70

Any hints as to why I can't seem to get this data to behave in CellPhoneDB would be appreciated!

datasome commented 4 months ago

Hi sierranishizaki,

Another (though less likely) reason for AllCountsFilteredException may be that your genes do not overlap with those in CellphoneDB database: Could you please confirm whether or not any of the genes in adata.var occur in the above file?



sierranishizaki commented 4 months ago

Thanks for the speedy response Robert! It appears that 208 of the 2k genes in my dataset have a match in gene_input.csv.

Just checking, is adata.var the place in the AnnData file CellPhoneDB is looking for gene names?

datasome commented 4 months ago

Hi sierranishizaki,

CellphoneDB is looking for gene names in the index of adata.to_df().T - hope that helps.

