Memory issue in estimateCellCounts() when dealing with large data

hansenlab / minfi

Devel repository for minfi

58 stars 68 forks source link

Memory issue in estimateCellCounts() when dealing with large data #81

Closed YinanZheng closed 7 years ago

YinanZheng commented 7 years ago

Hi,

We used the latest devel version of minfi and used estimateCellCounts() to estimate cell proportion with EPIC data. Everything runs smoothly. But when it comes to large data (like more than 2000 samples), running the function can easily hit the memory limit. We are using cluster which is offering 256G memory.

We could run estimateCellCounts() plate by plate, but will this introduce potential biases compared with running it with the whole data?

Thanks!

kasperdanielhansen commented 7 years ago

Our long term plans (within the next 6 months) are to transition to a HDF5 backend which (should) enable processing in bounded memory. Otherwise I have no comments.