ventolab / CellphoneDB

CellPhoneDB can be used to search for a particular ligand/receptor, or interrogate your own HUMAN single-cell transcriptomics data.
https://www.cellphonedb.org/
MIT License
322 stars 51 forks source link

Precision of pvalues #60

Closed ddiez closed 10 months ago

ddiez commented 2 years ago

It would be useful to be able to increase the precision of the pvalues in pvalues.txt output file (similar to argument --result-precision for the means. At the moment, there is only 1 digit precision, so the significant interactions will be anything less than 0.1. Although perhaps I am missing something?

datasome commented 1 year ago

Hi, In the latest release of CellphoneDB (https://pypi.org/project/CellphoneDB/) the procession of pvalues in pvalues.txt is up to 3 decimal places. Hope this should be enough for what you need. Best wishes, Robert.

stanaka6 commented 1 year ago

Hi! I might misunderstand but I am a little bit confused.

In the Method2 notebook in the values field, it says

cell_a|cell_b: 1 if interaction is detected as significant, 0 if not.

So, the output is binary? But, my method2 output contains other numbers (0~1, such as 0.208, 0.004, etc.) in addition to 0 and 1. Or are the numbers actual values and 0 means 0.000XX...?

I am using CellphoneDB v4.0.0 in the conda environment

Thank you for your help!

Details are below:

deconvoluted, means, pvalues, significant_means = cpdb_statistical_analysis_method.call(
    cpdb_file_path = cpdb_file_path,                 # mandatory: CellPhoneDB database zip file.
    meta_file_path = meta_file_path,                 # mandatory: tsv file defining barcodes to cell label.
    counts_file_path = counts_file_path,             # mandatory: normalized count matrix.
    counts_data = 'hgnc_symbol',                     # defines the gene annotation in counts matrix.
    iterations = 1000,                               # denotes the number of shufflings performed in the analysis.
    threshold = 0.1,                                 # defines the min % of cells expressing a gene for this to be employed in the analysis.
    threads = 4,                                     # number of threads to use in the analysis.
    debug_seed = 42,                                 # debug randome seed. To disable >=0.
    result_precision = 3,                            # Sets the rounding for the mean values in significan_means.
    pvalue = 0.05,                                   # P-value threshold to employ for significance.
    subsampling = False,                             # To enable subsampling the data (geometri sketching).
    subsampling_log = False,                         # (mandatory) enable subsampling log1p for non log-transformed data inputs.
    subsampling_num_pc = 100,                        # Number of componets to subsample via geometric skectching (dafault: 100).
    subsampling_num_cells = 1000,                    # Number of cells to subsample (integer) (default: 1/3 of the dataset).
    separator = '|',                                 # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".
    debug = False,                                   # Saves all intermediate tables employed during the analysis in pkl format.
    output_path = out_path,                          # Path to save results.
    output_suffix = 'XXX'                             # Replaces the timestamp in the output files by a user defined string $
    )

cellphoneDB.sbatch Reading user files... The following user files were loaded successfully: mySeurat.h5ad metadata.tsv [ ][CORE][27/04/23-12:04:01][INFO] [Cluster Statistical Analysis] Threshold:0.1 Iterations:1000 Debug-seed:42 Threads:4 Precision:3 [ ][CORE][27/04/23-12:04:01][WARNING] Debug random seed enabled. Set to 42 [ ][CORE][27/04/23-12:04:07][INFO] Running Real Analysis [ ][CORE][27/04/23-12:04:07][INFO] Running Statistical Analysis ^M 0%| | 0/1000 [00:00<?, ?it/s]^M 0%| | 1/1000 [00:02<37:59, 2.28s/it]^M 0%| | 2/1000 [00:02<21:41, 1.30s/i$ [ ][CORE][27/04/23-12:14:06][INFO] Building results

ktroule commented 1 year ago

Hi @stanaka6

You are right, for method 2 the pvalues file contains the actual p-value obtained from the permutation analysis. This description corresponds to the method 3.

Kind regards