Closed franknoe closed 8 years ago
A comment to point 2: after reading https://www.sqlite.org/howtocorrupt.html I'm not sure whether it is a good idea to try storing this information on a shared file system.
I'm very open to drop the database / persistent storage for now. If it can work robustly, it's nice to have of course. @clonker what do you think?
Am 14/04/16 um 21:23 schrieb Martin K. Scherer:
A comment to point 2: after reading https://www.sqlite.org/howtocorrupt.html I'm not sure whether it is a good idea to try storing this information on a shared file system.
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/markovmodel/PyEMMA/issues/760#issuecomment-210109579
Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin
Phone: (+49) (0)30 838 75354 Web: research.franknoe.de
BTW, Point 3: What does currently happen if multiple PyEMMA processes work in the same directory? I guess they won't write into the same logfile, will they?
Since newer versions of sqlite (sqlite3), it is supposed to be thread safe (see, e.g., here). When opening a connection to the database in python, there should be a named argument check_same_thread
which should prevent corruption in a multithreaded environment.
A simpler approach would maybe to have a small tool that generates such a database for a fixed set of trajectories which then can be loaded into pyemma..
Thread safety is not the crucial point here, but multi-process (write-) access. There you will most likely corrupt the database file because file locking is not well implemented on network filesystems like NFS. To point 3: Things would also be messed up here, since if the log file is in the same shared location, multiple processes will open the log file for writing. So we need to add a unique pattern to the log file name to avoid this.
All issues are now fixed by #774. I've turned of the persistence of the cache completely for the upcoming version. Before changing this again, we need to discuss whether it is better to have this handled by TrajDB.
Sorry for being unclear - at least we should discuss this issue in some detail before pushing a release.
Am 18/04/16 um 20:07 schrieb Martin K. Scherer:
All issues are now fixed by #774 https://github.com/markovmodel/PyEMMA/pull/774. I've turned of the persistence of the cache completely for the upcoming version. Before changing this again, we need to discuss whether it is better to have this handled by TrajDB.
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/markovmodel/PyEMMA/issues/760#issuecomment-211506392
Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin
Phone: (+49) (0)30 838 75354 Web: research.franknoe.de
Am 18.04.2016 um 20:26 schrieb Frank Noe:
Sorry for being unclear - at least we should discuss this issue in some detail before pushing a release. Right. I thought you want me to push it ASAP, since it fixes a critical problem.
https://github.com/markovmodel/PyEMMA/issues/775#issuecomment-211293446
- So is the trajectory cache only in memory now, or is there some way to activate it? How does that behavior affect the calculations @gph82 mentioned? Yes, he will not suffer from exceptions due to cache access now since it is only stored in memory.
- Is the new behavior even tested in practical use? Please clarify. We have a lot of test cases. The cache used to life in memory before persistence got into play. So it is just a switch back to the old behaviour - so the answer should be from my point of view yes. If you ask for a relase candidate cycle or whatever for a service release, which should be tested intensively, then please tell me in advance.
Am 18/04/16 um 20:46 schrieb Martin K. Scherer:
Am 18.04.2016 um 20:26 schrieb Frank Noe:
Sorry for being unclear - at least we should discuss this issue in some detail before pushing a release. Right. I thought you want me to push it ASAP, since it fixes a critical problem.
https://github.com/markovmodel/PyEMMA/issues/775#issuecomment-211293446 I know, but I didn't mean to cut short on testing and discussion of issues - otherwise we just introduce new problems.
- So is the trajectory cache only in memory now, or is there some way to activate it? How does that behavior affect the calculations @gph82 mentioned? Yes, he will not suffer from exceptions due to cache access now since it is only stored in memory. Right, but is there any way to make the cache persistent now, or is this switched off? If I understood Guille correctly, it was important for his application that he has the possibility to use the trajectory cache, and we had discussed to switch it on optionally.
- Is the new behavior even tested in practical use? Please clarify. We have a lot of test cases. The cache used to life in memory before persistence got into play. So it is just a switch back to the old behaviour - so the answer should be from my point of view yes. If you ask for a relase candidate cycle or whatever for a service release, which should be tested intensively, then please tell me in advance. What I meant is - has anyone at least installed and worked with this version.
Let's play with it a bit before releasing this patch.
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/markovmodel/PyEMMA/issues/760#issuecomment-211521979
Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin
Phone: (+49) (0)30 838 75354 Web: research.franknoe.de
Address issues/policies related to creation of files that log, write config or cache trajectories. Since these issues are closely related, I summarize them here. Feel free to split them into separate issues if more appropriate.
~/.pyemma/
or has explicitly requested to write that file by something likepyemma.config.save()
. By default, PyEMMA should not try to create a PyEMMA directory or write a config file. In order to inform the user, the state of the configuration setting should be logged (at a level that is visible by default) when PyEMMA is imported. For example:PyEMMA v2.1. No configuration file found, defaults are used
orPyEMMA v2.1. Using configuration found at ~/.pyemma/pyemma.cfg
~/.pyemma
. If either of these conditions are not met, or storage creation is unsuccessful, caching information will be be stored persistently during this session. We need a way to clear up the cache. Suggested solution: use a queue for cached entries and maximize the total cache size (e.g. 100 MB).The overall picture is that PyEMMA should only try to create files with useful information if explicitly demanded. Even in this case, creation of files should always be attempted with a fallback option, such that we don't generate crashes when we can't access the disk. However, in order to make it easy for the average user to change settings, I suggest a global "look and feel" switch such as
pyemma.config.make_my_life_easy(default=False)
(please find better name). If set to false, I suggest reporting that at import time (visibly), such that it's obvious how to activate easy life.