ventolab / CellphoneDB

CellPhoneDB can be used to search for a particular ligand/receptor, or interrogate your own HUMAN single-cell transcriptomics data.
https://www.cellphonedb.org/
MIT License
304 stars 52 forks source link

Can Minimum p-value less than 1e-3 #179

Closed yxwucq closed 3 months ago

yxwucq commented 3 months ago

I'm not sure, but assume that the p-value is caculated from random permutation. But when I increase the permutation times like set iterations to 100000, the output minimum is still 1e-3 in the file 'statistical_analysis_pvalues.txt'.

cpdb_results = cpdb_statistical_analysis_method.call(
    cpdb_file_path = cpdb_file_path,                 # mandatory: CellphoneDB database zip file.
    meta_file_path = meta_file_path,                 # mandatory: tsv file defining barcodes to cell label.
    counts_file_path = counts_file_path,             # mandatory: normalized count matrix - a path to the counts file, or an in-memory AnnData object
    counts_data = 'hgnc_symbol',                     # defines the gene annotation in counts matrix.
    # active_tfs_file_path = active_tf_path,           # optional: defines cell types and their active TFs.
    # microenvs_file_path = microenvs_file_path,       # optional (default: None): defines cells per microenvironment.
    score_interactions = True,                       # optional: whether to score interactions or not. 
    iterations = 100000,                               # denotes the number of shufflings performed in the analysis.
    threshold = 0.05,                                 # defines the min % of cells expressing a gene for this to be employed in the analysis.
    threads = 12,                                     # number of threads to use in the analysis.
    debug_seed = 1,                                 # debug randome seed. To disable >=0.
    result_precision = 3,                            # Sets the rounding for the mean values in significan_means.
    pvalue = 0.05,                                   # P-value threshold to employ for significance.
    subsampling = False,                             # To enable subsampling the data (geometri sketching).
    subsampling_log = False,                         # (mandatory) enable subsampling log1p for non log-transformed data inputs.
    subsampling_num_pc = 100,                        # Number of componets to subsample via geometric skectching (dafault: 100).
    # subsampling_num_cells = 1000,                    # Number of cells to subsample (integer) (default: 1/3 of the dataset).
    separator = '|',                                 # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".
    debug = False,                                   # Saves all intermediate tables employed during the analysis in pkl format.
    output_path = out_path,                          # Path to save results.
    output_suffix = None                             # Replaces the timestamp in the output files by a user defined string in the  (default: None).
    )
yxwucq commented 3 months ago

Sry, I've check the result, p-values 'statistical_analysis_pvalues.txt' are round to 1e-5 float as expected. The problem is in ploting library kpy.plot_cpdb function which automatically round result to 1e-3 scale when encountering 0. In https://github.com/zktuong/ktplotspy/blob/master/ktplotspy/plot/plot_cpdb.py

for i in df.index:
    if df.at[i, "pvals"] < alpha:
        df.at[i, "x_means"] = np.nan
        if df.at[i, "pvals"] == 0:
            df.at[i, "pvals"] = 0.001