memory issue on HMLE_TGFb_day_8_10 example from MAGIC

juexinwang / scGNN

scGNN (single cell graph neural networks) for single cell clustering and imputation using graph neural networks

MIT License

133 stars 40 forks source link

memory issue on HMLE_TGFb_day_8_10 example from MAGIC #23

Open adhikarirsr opened 2 years ago

adhikarirsr commented 2 years ago

python -W ignore PreprocessingscGNN.py --datasetName HMLE_TGFb_day_8_10.csv.gz --datasetDir magic_HMLE/ --LTMGDir magic_HMLE/ --filetype CSV --geneSelectnum 2000

gives

MemoryError: Unable to allocate 1.36 TiB for an array with shape (12417, 15044000) and data type int64

I got the file from here

juexinwang commented 2 years ago

The file is so huge, that happens for the single-cell analysis. You may either find a machine with 1.36TB or split the file into different parts and process them seperately

adhikarirsr commented 2 years ago

How can we split and process separately? Is this the limitation of scGNN method?

juexinwang commented 2 years ago

This is just preprocessing step, not touching scGNN yet. The shape (12417, 15044000) may be 15,044,000 cells, 12,417 genes. Usually, you need a big machine to deal with that. Split means you may divide the file into 100 small files. Each new file may contain 150,440 cells. Do the imputation on them individually. You can change how many parts you want according to your machine's memory.