Error when attempting to use a path which contains mtx/barcode/features files as an input

SuiyueWestlake commented 2 years ago

Hi,

When I try to use the path as an input, my command is cellphonedb method statistical_analysis /Users/tangsuiyue/Documents/Documents/Project/MUSE/Related_paper_data/cpdb

But I got an Error as follows: Usage: cellphonedb method statistical_analysis [OPTIONS] META_FILENAME COUNTS_FILENAME Try 'cellphonedb method statistical_analysis --help' for help.

Error: Missing argument 'COUNTS_FILENAME'.

How can I fix this problem? Thank you!

Leonhard2000 commented 2 years ago

Dear Suiyue,

CPDB requires the name of your "counts" & "meta" file but you only provided the path. The correct input command looks like this: cellphonedb method statistical_analysis folder_1/folder_2/filename_meta.txt folder_1/folder_2/filename_counts.txt

Where "folder_1/folder_2/" represent your path and "filename_meta.txt" & "filename_counts.txt" represent your input files.

In your case try (if you use the CPDB test files otherwise change the txt-filename): cellphonedb method statistical_analysis Users/tangsuiyue/Documents/Documents/Project/MUSE/Related_paper_data/cpdb/test_meta.txt Users/tangsuiyue/Documents/Documents/Project/MUSE/Related_paper_data/cpdb/test_counts.txt

SuiyueWestlake commented 2 years ago

Dear Leonhard,

I'm not sure I understand the instruction correctly. It says "Counts file can be a text file or a h5ad (recommended), h5 or a path to a folder containing a 10x output with mtx/barcode/features files."

So, if I want to give a path containing a 10x output with mtx/barcode/features files, I still need to prepare the counts file?

Leonhard2000 commented 2 years ago

Dear Suiyue,

I never worked with 10x or h5ad files but depending on their instruction ("Run example", "Using h5ad count file") you still have to tell the software the filenames: cellphonedb method analysis test_meta.txt test_counts.h5ad

So you do not need to prepare the couns file (use .h5ad instead of .txt). But it seems you need to prepare the meta_file which should look like this (in this case blood cell types define the groups): Cell cell_type AAACCTGAGACGCTTT-1 T AAACCTGAGTTTAGGA-1 T AAACCTGCAAAGTGCG-1 Mono AAACCTGCAAGTACCT-1 Mono AAACGGGAGACCTAGG-1 T AAACGGGAGCCTTGAT-1 NK

SuiyueWestlake commented 2 years ago

Dear Leonhard,

I use meta file and h5ad file as my input, but the Error suggests that it need 10.3 T to run cpdb. Dose it really need such a huge computing resource or I did something wrong? (There are 312928 cells in my data).

My command is cellphonedb method analysis MetadataTable.txt GSE136831.h5ad --output-path fibrosis_GSE136831_cpdb_result

And my error are as follows: Traceback (most recent call last): File "/home/lixuLab/suiyue/miniconda3/envs/cpdb/lib/python3.7/site-packages/cellphonedb/src/api_endpoints/terminal_api/method_terminal_api_endpoints/method_terminal_commands.py", line 216, in analysis debug, File "/home/lixuLab/suiyue/miniconda3/envs/cpdb/lib/python3.7/site-packages/cellphonedb/src/local_launchers/local_method_launcher.py", line 119, in cpdb_analysis_local_method_launcher output_path) File "/home/lixuLab/suiyue/miniconda3/envs/cpdb/lib/python3.7/site-packages/cellphonedb/src/core/methods/method_launcher.py", line 155, in cpdb_method_analysis_launcher output_path) File "/home/lixuLab/suiyue/miniconda3/envs/cpdb/lib/python3.7/site-packages/cellphonedb/src/core/methods/cpdb_analysis_method.py", line 103, in call separator) File "/home/lixuLab/suiyue/miniconda3/envs/cpdb/lib/python3.7/site-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_helper.py", line 282, in build_result_matrix result = pd.DataFrame(index=interactions.index, columns=columns, dtype=float) File "/home/lixuLab/suiyue/miniconda3/envs/cpdb/lib/python3.7/site-packages/pandas/core/frame.py", line 468, in init mgr = init_dict(data, index, columns, dtype=dtype) File "/home/lixuLab/suiyue/miniconda3/envs/cpdb/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 269, in init_dict arrays.loc[missing] = [val] * missing.sum() File "/home/lixuLab/suiyue/miniconda3/envs/cpdb/lib/python3.7/site-packages/pandas/core/indexing.py", line 670, in setitem iloc._setitem_with_indexer(indexer, value) File "/home/lixuLab/suiyue/miniconda3/envs/cpdb/lib/python3.7/site-packages/pandas/core/indexing.py", line 1800, in _setitem_with_indexer self.obj._mgr = self.obj._mgr.setitem(indexer=indexer, value=value) File "/home/lixuLab/suiyue/miniconda3/envs/cpdb/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 534, in setitem return self.apply("setitem", indexer=indexer, value=value) File "/home/lixuLab/suiyue/miniconda3/envs/cpdb/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 406, in apply applied = getattr(b, f)(**kwargs) File "/home/lixuLab/suiyue/miniconda3/envs/cpdb/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 849, in setitem arr_value = np.array(value) numpy.core._exceptions._ArrayMemoryError: Unable to allocate 10.3 TiB for an array with shape (674181225, 2092) and data type float64

Leonhard2000 commented 2 years ago

Dear Suiyue,

to make things clear, I am just a CPDB user like you and never used h5ad files. At least it seems to run now and someone more experienced might help you forward.

And yes, CPDB has a huge memory usage especially for big files like yours. I have only a little bit of coding knowledge but it seems that CPDB tryes to read your whole file into the memory at once and thus your RAM had to be bigger than your file size. Maybe a subsampling helps (https://github.com/ventolab/CellphoneDB): --subsampling --subsampling-num-cells

CPDB "only" has 1620 gene entries and your data probably more which is not used but maybe still read into memory. A new smaller dataset with only these 1620 genes would decrease your file size but requires coding knowledge.

It looks like i can't be of any further help.

SuiyueWestlake commented 2 years ago

Dear Leonhard,

Thank you so much for help! I thought that you are one of the developers. Maybe I should contact the developer for further help or change another software to analyze my data.

Best, Yue

prete commented 2 years ago

Hi @SuiyueWestlake like @Leonhard2000 very accurately pointed, CellPhoneDB has a very large memory footprint. However, your error shows 10.3 TiB which is not normal at all. Could I quickly check how your GSE136831.h5ad was crafted?

SuiyueWestlake commented 2 years ago

Yes, of course! My code are as follows

`matrix_dir = "/storage/lixuLab/suiyue/CellPhoneDB/fibrosis/IPF_Cell_Atlas_GSE136831/" barcode.path <- paste0(matrix_dir, "barcodes.tsv") features.path <- paste0(matrix_dir, "features.tsv") matrix.path <- paste0(matrix_dir, "matrix.mtx") mat <- readMM(file = matrix.path) feature.names = read.delim(features.path, header = FALSE, stringsAsFactors = FALSE) barcode.names = read.delim(barcode.path, header = FALSE, stringsAsFactors = FALSE) colnames(mat) = barcode.names$V1 rownames(mat) = feature.names$V1

Seurat

mydata <- CreateSeuratObject(counts = mat) PRO = UpdateSeuratObject(object = mydata) SaveH5Seurat(PRO, filename = "GSE136831.h5ad") Convert("GSE136831.h5ad", dest = "h5ad")`

Thank You!

ventolab / CellphoneDB

Error when attempting to use a path which contains mtx/barcode/features files as an input #69

Seurat