Closed ddiez closed 10 months ago
Hi, In the latest release of CellphoneDB (https://pypi.org/project/CellphoneDB/) the procession of pvalues in pvalues.txt is up to 3 decimal places. Hope this should be enough for what you need. Best wishes, Robert.
Hi! I might misunderstand but I am a little bit confused.
In the Method2 notebook in the values field, it says
cell_a|cell_b: 1 if interaction is detected as significant, 0 if not.
So, the output is binary? But, my method2 output contains other numbers (0~1, such as 0.208, 0.004, etc.) in addition to 0 and 1. Or are the numbers actual values and 0 means 0.000XX...?
I am using CellphoneDB v4.0.0 in the conda environment
Thank you for your help!
Details are below:
deconvoluted, means, pvalues, significant_means = cpdb_statistical_analysis_method.call(
cpdb_file_path = cpdb_file_path, # mandatory: CellPhoneDB database zip file.
meta_file_path = meta_file_path, # mandatory: tsv file defining barcodes to cell label.
counts_file_path = counts_file_path, # mandatory: normalized count matrix.
counts_data = 'hgnc_symbol', # defines the gene annotation in counts matrix.
iterations = 1000, # denotes the number of shufflings performed in the analysis.
threshold = 0.1, # defines the min % of cells expressing a gene for this to be employed in the analysis.
threads = 4, # number of threads to use in the analysis.
debug_seed = 42, # debug randome seed. To disable >=0.
result_precision = 3, # Sets the rounding for the mean values in significan_means.
pvalue = 0.05, # P-value threshold to employ for significance.
subsampling = False, # To enable subsampling the data (geometri sketching).
subsampling_log = False, # (mandatory) enable subsampling log1p for non log-transformed data inputs.
subsampling_num_pc = 100, # Number of componets to subsample via geometric skectching (dafault: 100).
subsampling_num_cells = 1000, # Number of cells to subsample (integer) (default: 1/3 of the dataset).
separator = '|', # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".
debug = False, # Saves all intermediate tables employed during the analysis in pkl format.
output_path = out_path, # Path to save results.
output_suffix = 'XXX' # Replaces the timestamp in the output files by a user defined string $
)
cellphoneDB.sbatch Reading user files... The following user files were loaded successfully: mySeurat.h5ad metadata.tsv [ ][CORE][27/04/23-12:04:01][INFO] [Cluster Statistical Analysis] Threshold:0.1 Iterations:1000 Debug-seed:42 Threads:4 Precision:3 [ ][CORE][27/04/23-12:04:01][WARNING] Debug random seed enabled. Set to 42 [ ][CORE][27/04/23-12:04:07][INFO] Running Real Analysis [ ][CORE][27/04/23-12:04:07][INFO] Running Statistical Analysis ^M 0%| | 0/1000 [00:00<?, ?it/s]^M 0%| | 1/1000 [00:02<37:59, 2.28s/it]^M 0%| | 2/1000 [00:02<21:41, 1.30s/i$ [ ][CORE][27/04/23-12:14:06][INFO] Building results
Hi @stanaka6
You are right, for method 2 the pvalues file contains the actual p-value obtained from the permutation analysis. This description corresponds to the method 3.
Kind regards
It would be useful to be able to increase the precision of the pvalues in pvalues.txt output file (similar to argument
--result-precision
for the means. At the moment, there is only 1 digit precision, so the significant interactions will be anything less than 0.1. Although perhaps I am missing something?