markovmodel / PyEMMA

🚂 Python API for Emma's Markov Model Algorithms 🚂
http://pyemma.org
GNU Lesser General Public License v3.0
307 stars 118 forks source link

[opinion] dynamic caching is more viable than counting memory consumption #147

Closed fabian-paul closed 4 years ago

fabian-paul commented 9 years ago

The implementation of the 'operate in memory' function in the coordinate module is complicated by the fact that it is impossible to reliably estimate the memory consumption in the transfromers. Each transformer calls one or many numpy or mdtraj functions each of which may allocate dynamic memory. Those allocations are rarely documentated and we are forced to inspect the source code of the libraries. These allocations can be of the same size as the chunk size.

Additionally the application runs in a multitasking operating system. So querying the amount of free memory and subtracting the amout of memory that our application is expected to consume is no reliable estimate for the free computer memory. Opening a tab in the Firefox browser may suddenly decrease the free memory by dozens of megabytes. Combined with our memory-greedy strategy of putting as many data into the RAM as possible this is a recipe for disaster.

One solution to this problem is that our application should have the abilty to 'back off' an return memory to the operating system in times of memory shortage. All operating systems have an API to report memory shortage to programs. We could interface this API and replace the 'operate in memory' by a data cache. When there is no memory shortage, all data is held in memory. In case of shortage, the cache is emptied and some chunks have to be recomputed.

Here are some possible guidelines for the implemention of such a cache:

So this simplest form of caching which is based on access times, should give good performance when combined with the transformation pipeline.

franknoe commented 9 years ago

I think caching is a very good idea, but I would guess it is a major effort to do it. Am I right?


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

fabian-paul commented 9 years ago

Trying this it out is not so difficult. Maybe we will run into some unforeseen problem like the Python garbage collector not giving memory back to the operating system etc. But this is hard to say in advance. The question is what is harder to do: implementing caching reliably or getting the old 'operate in memory' idea to work reliably?

franknoe commented 9 years ago

I really don't know. It's worthwhile to play around with it to get a feeling.

Am 26/03/15 um 19:51 schrieb fabian-paul:

Trying this it out is not so difficult. Maybe we will run into some unforeseen problem like the Python garbage collector not giving memory back to the operating system etc. But this is hard to say in advance. The question is what is harder to do: implementing caching reliably or getting the old 'operate in memory' idea to work reliably?

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/PyEMMA/issues/147#issuecomment-86667905.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

marscher commented 9 years ago

This might be interesting on building cache memory overflow callbacks on Linux.

Maybe there is a similar technique available on OSX.

https://pypi.python.org/pypi/cgroup-utils/0.5

cgroups provides a callback interface for userspace software to be informed if free memory will fall below a certain threshold.

So this is a nice possiblity to avoid swapping the system to death by maximizing chunksizes and amount of precomputed frames.

I strongly vote for having this possibility of only using a maximum memory threshold for cache clearance. Maybe we might also estimate cache hit rates analytically beforehand? This result can we use to rate the effort we wanna invest here.

But it really seems promising to avoid double computations with a fairly large cache!

franknoe commented 9 years ago

We should pick this up again after the API redesign. I am not sure if this is still an issue.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.