Open adhikarirsr opened 2 years ago
The file is so huge, that happens for the single-cell analysis. You may either find a machine with 1.36TB or split the file into different parts and process them seperately
How can we split and process separately? Is this the limitation of scGNN method?
This is just preprocessing step, not touching scGNN yet. The shape (12417, 15044000) may be 15,044,000 cells, 12,417 genes. Usually, you need a big machine to deal with that. Split means you may divide the file into 100 small files. Each new file may contain 150,440 cells. Do the imputation on them individually. You can change how many parts you want according to your machine's memory.
python -W ignore PreprocessingscGNN.py --datasetName HMLE_TGFb_day_8_10.csv.gz --datasetDir magic_HMLE/ --LTMGDir magic_HMLE/ --filetype CSV --geneSelectnum 2000
gives
MemoryError: Unable to allocate 1.36 TiB for an array with shape (12417, 15044000) and data type int64
I got the file from here