ventolab / CellphoneDB

CellPhoneDB can be used to search for a particular ligand/receptor, or interrogate your own HUMAN single-cell transcriptomics data.
https://www.cellphonedb.org/
MIT License
305 stars 52 forks source link

Invalid Counts data #126

Closed zhuyewen closed 7 months ago

zhuyewen commented 1 year ago

Hello, cellphonedb team!

Thank you very much for open sourcing such a great single cell tool. I have been using it from 3.0 until now 4.0 and it has helped me a lot.

But in the recent 4.0, I have encountered some problems. I can work fine using the data you provided. But when using my own data, it prompts an error.

I input these data like this:

cpdb_file_path = '/Users/zhuyewen/Downloads/CellphoneDB-master/db/v4.1.0/cellphonedb.zip'

meta_file_path = '/Users/zhuyewen/R/技术摸索/23.cellphonedb/metadata.tsv'
counts_file_path = '/Users/zhuyewen/R/技术摸索/23.cellphonedb/pbmc3k.h5ad'

out_path = '/Users/zhuyewen/Downloads/CellphoneDB-master/result/method2'

...

from cellphonedb.src.core.methods import cpdb_statistical_analysis_method

deconvoluted, means, pvalues, significant_means = cpdb_statistical_analysis_method.call(
    cpdb_file_path = cpdb_file_path,                 # mandatory: CellPhoneDB database zip file.
    meta_file_path = meta_file_path,                 # mandatory: tsv file defining barcodes to cell label.
    counts_file_path = counts_file_path,             # mandatory: normalized count matrix.
    counts_data = 'hgnc_symbol',                     # defines the gene annotation in counts matrix.
#     microenvs_file_path = microenvs_file_path,       # optional (default: None): defines cells per microenvironment.
    iterations = 1000,                               # denotes the number of shufflings performed in the analysis.
    threshold = 0.1,                                 # defines the min % of cells expressing a gene for this to be employed in the analysis.
    threads = 4,                                     # number of threads to use in the analysis.
    debug_seed = 42,                                 # debug randome seed. To disable >=0.
    result_precision = 3,                            # Sets the rounding for the mean values in significan_means.
    pvalue = 0.05,                                   # P-value threshold to employ for significance.
    subsampling = False,                             # To enable subsampling the data (geometri sketching).
    subsampling_log = False,                         # (mandatory) enable subsampling log1p for non log-transformed data inputs.
    subsampling_num_pc = 100,                        # Number of componets to subsample via geometric skectching (dafault: 100).
    subsampling_num_cells = 1000,                    # Number of cells to subsample (integer) (default: 1/3 of the dataset).
    separator = '|',                                 # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".
    debug = False,                                   # Saves all intermediate tables employed during the analysis in pkl format.
    output_path = out_path,                          # Path to save results.
    output_suffix = None                             # Replaces the timestamp in the output files by a user defined string in the  (default: None).
    )

And I got this error

ParseCountsException: Invalid Counts data

Here's how I generated the expression matrix file in Rstudio


library(SeuratDisk)
library(Seurat)

SaveH5Seurat(pbmc3k.final, filename = "pbmc3k.h5Seurat")
Convert("pbmc3k.h5Seurat", dest = "h5ad",assay = 'RNA')

I am not sure if there is something wrong with the code I used to generate the h5ad file, if so could you please provide the code on how to generate that file using R language?

I browsed through all the issues and found no similar questions or answers. I hope you have time to tell me how to generate h5ad from Seurat object by R, and hopefully provide a solution for people who encounter the same problem later. Thanks a lot.

My platform: MacOs13.3, Apple M1 Ultra, R version 4.3.0, python = 3.8

Best regard

Yewen Zhu

ktroule commented 1 year ago

Hi.

We have this notebook on how to convert your seurat object for CellPhoneDB. Other option, can be the use of sceasy to convert from seurat to scanpy.

Kind regards

luzgaral commented 1 year ago

Hi,

Also, you can use the strategies mentioned here (how-to-extract-the-cellphonedb-input-files-from-a-seurat-object) to convert Seurat objects into formats accepted by CellPhoneDB.

Best

Luz