Open liubovpashkova opened 4 years ago
I'm planning to convert data input to np.float32
by default, but allowing the user to specify an alternative format via a datatype argument. I just need to figure out a future proof way of doing so – I am currently considering adding a kwargs
argument to ESObject
.
Possibly solved by release v1.1.0. I will have to do some testing to find out if we need more aggressive changes.
I have been running CELLEX on a huge dataset with ~1.3kk cells recently. To my surprise, I encountered the following error:
MemoryError: Unable to allocate array with shape (1331984, 26182) and data type float64
Thus, the server has not enough memory to complete the task if the expression matrix is stored as float64 (by default). CELLEX consumes > 50% of RAM (more than 1 TB) and then the analysis inextricably stops.
2 developers: is it really necessary to store the expression matrix as float64? This super high precision is relevant? Are you sure that float32 is not sufficient?
2 users: I was able to solve the problem by converting my gene expression matrix (the variable
data
in the tutorial) from the default data type float64 to float32 before creatingESObject
as followsdata_float32 = data.astype(np.float32)
Don’t forget to delete the variables after (we need to save the Yggdrasil’s RAM):