Closed jhprinz closed 7 years ago
I realized that some PR closed #23. So let's continue here.
I still need to update the examples. But after that we are good to go and have all the features that we wanted.
Excellent thank you! Let us fully focus on the docs now.
Sent from my T-Mobile 4G LTE Device
-------- Original message -------- From: Jan-Hendrik Prinz notifications@github.com Date: 3/22/17 5:43 AM (GMT-06:00) To: markovmodel/adaptivemd adaptivemd@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [markovmodel/adaptivemd] [WIP] Allow output types (#28)
I still need to update the examples. But after that we are good to go and have all the features that we wanted.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/markovmodel/adaptivemd","title":"markovmodel/adaptivemd","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/markovmodel/adaptivemd"}},"updates":{"snippets":[{"icon":"PERSON","message":"@jhprinz in #28: I still need to update the examples. But after that we are good to go and have all the features that we wanted."}],"action":{"name":"View Pull Request","url":"https://github.com/markovmodel/adaptivemd/pull/28#issuecomment-288360934"}}}
So, examples are up. Please have a look! @nsplattner @thempel @franknoe
I think this is much more powerful now. I will update the docs some more and see to make a decent webpage.
Thank you, will do
Am 22/03/17 um 20:35 schrieb Jan-Hendrik Prinz:
So, examples are up. Please have a look! @nsplattner https://github.com/nsplattner @thempel https://github.com/thempel @franknoe https://github.com/franknoe
I think this is much more powerful now. I will update the docs some more and see to make a decent webpage.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/markovmodel/adaptivemd/pull/28#issuecomment-288514058, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQppWQ7jLjLyIIjymzcBjnivHq5ygks5roXfkgaJpZM4MkkoY.
--
Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin
I just tested this PR as described in the tutorial updated in #34. The following happens when I add the engine to the project generators. Did I miss something?
>>> project.generators.add(engine)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-71-d101ab5a5a33> in <module>()
----> 1 project.generators.add(engine)
/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/bundle.pyc in add(self, item)
321 if self._set is not None and item not in self._set:
322 logger.info('Added file of type `%s`' % item.__class__.__name__)
--> 323 self._set.save(item)
324
325 @property
/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/object.pyc in save(self, obj)
703
704 try:
--> 705 self._save(obj)
706 self.cache[uuid] = obj
707
/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/object.pyc in _save(self, obj)
488
489 def _save(self, obj):
--> 490 dct = self.storage.simplifier.to_simple_dict(obj)
491 self._document.insert(dct)
492 obj.__store__ = self
/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in to_simple_dict(self, obj, base_type)
524 '_cls': obj.__class__.__name__,
525 '_obj_uuid': str(UUID(int=obj.__uuid__)),
--> 526 '_dict': self.simplify(obj.to_dict(), base_type),
527 '_id': str(UUID(int=obj.__uuid__)),
528 '_time': int(obj.__time__)}
/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
557 '_store': store.name}
558
--> 559 return super(UUIDObjectJSON, self).simplify(obj, base_type)
560
561 def build(self, obj):
/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
166 else:
167 result = {
--> 168 key: self.simplify(o) for key, o in obj.iteritems()
169 if key not in self.excluded_keys
170 }
/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in <dictcomp>((key, o))
167 result = {
168 key: self.simplify(o) for key, o in obj.iteritems()
--> 169 if key not in self.excluded_keys
170 }
171
/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
557 '_store': store.name}
558
--> 559 return super(UUIDObjectJSON, self).simplify(obj, base_type)
560
561 def build(self, obj):
/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
145 return None
146 elif type(obj) is list:
--> 147 return [self.simplify(o, base_type) for o in obj]
148 elif type(obj) is tuple:
149 return {'_tuple': [self.simplify(o, base_type) for o in obj]}
/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
557 '_store': store.name}
558
--> 559 return super(UUIDObjectJSON, self).simplify(obj, base_type)
560
561 def build(self, obj):
/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
125 '_cls': obj.__class__.__name__,
126 '_obj_uuid': str(UUID(int=obj.__uuid__)),
--> 127 '_dict': self.simplify(obj.to_dict(), base_type)}
128 else:
129 return {
/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
557 '_store': store.name}
558
--> 559 return super(UUIDObjectJSON, self).simplify(obj, base_type)
560
561 def build(self, obj):
/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
166 else:
167 result = {
--> 168 key: self.simplify(o) for key, o in obj.iteritems()
169 if key not in self.excluded_keys
170 }
/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in <dictcomp>((key, o))
167 result = {
168 key: self.simplify(o) for key, o in obj.iteritems()
--> 169 if key not in self.excluded_keys
170 }
171
/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
552 if not obj._ignore:
553 store = self.storage._obj_store[obj.__class__]
--> 554 store.save(obj)
555 return {
556 '_hex_uuid': hex(obj.__uuid__),
/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/object.pyc in save(self, obj)
703
704 try:
--> 705 self._save(obj)
706 self.cache[uuid] = obj
707
/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/object.pyc in _save(self, obj)
488
489 def _save(self, obj):
--> 490 dct = self.storage.simplifier.to_simple_dict(obj)
491 self._document.insert(dct)
492 obj.__store__ = self
/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in to_simple_dict(self, obj, base_type)
524 '_cls': obj.__class__.__name__,
525 '_obj_uuid': str(UUID(int=obj.__uuid__)),
--> 526 '_dict': self.simplify(obj.to_dict(), base_type),
527 '_id': str(UUID(int=obj.__uuid__)),
528 '_time': int(obj.__time__)}
/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/file.pyc in to_dict(self)
368 def to_dict(self):
369 ret = super(File, self).to_dict()
--> 370 if self._file:
371 ret['_file_'] = base64.b64encode(self._file)
372
/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/syncvar.pyc in __get__(self, instance, owner)
38 if instance.__store__ is not None:
39 idx = self._idx(instance)
---> 40 value = self._update(instance.__store__, idx)
41 self.values[instance] = value
42 return value
/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/syncvar.pyc in _update(self, store, idx)
23 if store is not None:
24 return store._document.find_one(
---> 25 {'_id': idx}).get(self.name)
26
27 return None
AttributeError: 'NoneType' object has no attribute 'get'
This looks like to did not delete the project before restarting. Some of the internals have changed. Try
Project.delete(proj_name)
If that does not help could you post your engine definition?
That the message says is that you try to access an attribute ofan object that is marked as stored but does not exist in the db. That could happen if you reuse a pdb file e.g. after deletion of the project.
Ahh, thanks, I tried this but probably mixed something up. Now this error is resolved, but followed by another one:
DocumentTooLarge: BSON document too large (20205194 bytes) - the connected server supports BSON document sizes up to 16777216 bytes.
I'm using the same system as before and never had problems to load it into the DB. Looks like it is loading everything twice.
engine.items()
[('pdb_file_stage', 'init_adaptive.pdb'),
('integrator_file', 'integrator.xml'),
('_executable_file', 'openmmrun.py'),
('system_file_stage', 'system.xml'),
('pdb_file', 'init_adaptive.pdb'),
('integrator_file_stage', 'integrator.xml'),
('_executable_file_stage', 'openmmrun.py'),
('system_file', 'system.xml')]
well, strange... let me see...
All files were stored 2 before. But only the ones without _stage
have content. Could you check that?
for k, v in engine.items():
print len(v._file) if v._file is not None else 0
This works fine for me. So, I suspect that there is something else getting large.
Can you compare the file sizes with the original files? Just to make sure there is no overhead?
This seems to work and also the files on disc show the same number of characters. They have a total size of 11.5 M on disc, so it should be fine.
>>> for k, v in engine.items():
>>> print v.short, len(v._file) if v._file is not None else 0
staging:///init_adaptive.pdb 0
file://{}/integrator.xml 117
file://{}/openmmrun.py 8828
staging:///system.xml 0
file://{}/init_adaptive.pdb 2204265
staging:///integrator.xml 0
staging:///openmmrun.py 0
file://{}/system.xml 8659243
Just scrolled through the above files in my notebook, there content seems fine. Is their anything else being copied?
This is the question. When exactly did this error happen. I assume you ran the setup from top with PDB, system.xml, etc and then, when storing the engine you got the error? So, it cannot have been caused by some other files, right? There are no other files present.
Found the bug/storage inefficiency. The file is really stored twice, which is definitely not intended. Will issue a quick fix. Still we should make it use the new storage option
Wow, this was a real tough one. Involving that weakref.WeakKeyDictionary
uses hashing which depends on the pymongodb _id which in my implementation is set after object creation st. Due to the change of the hash you cannot find the same object in the WeakKeyDict... I should give the next seminar on that one...
No idea, how I found this one. That was probably the most hidden error so far...
Still, unfortunately #35 contains the fix and also allowing to store arbitrary large files now.
Problem is that when I will merge #35. This one will be merged as well... So let's at least finish this discussion. Additions from #35 are additional features while this PR changes the general concept of trajectories...
I like the description of this task very much. The only point that concerns me a little bit is the last point (how to featurize), because it creates a relatively hard dependency on PyEMMA and our current naming conventions. There are two issues with this: (1) If you always depend on PyEMMA, this makes the dependencies very heavy (e.g. you also depend on things like matplotlib which are clearly irrelevant for this package) and many dependencies also means there are many ways for the package to break down if dependencies change. (2) Although we don't have a concrete plan for that, it is not impossible that the look+feel of PyEMMA featurization will change at some point. I know there are some deficiencies with the current one.
To address that, please check where you actually depend on PyEMMA and if possible find a way to make that dependency optional to your package, i.e. if the user doesn't need a certain functionality (e.g. writes their own analysis class), it shouldn't automatically install PyEMMA. For the second point, since you basicly have to look up the PyEMMA API in order to write this pseudocode anyway, why not just use the PyEMMA function names directly (with the 'add_'). In any case this needs to be clearly documented, i.e. add a link to the PyEMMA featurizer in the present API docs.
Looking at the examples now...
merging this
This implements all of the discussion in #23 .
You can now make arbitrary mixtures of selection/ stride for your engine and run and extend, etc.
Only point missing is that PyEmma will need either a reduced PDB or a selection string to work with not full atom sets (this is because my pyemma script computed backbone angles and without topology this is difficult)
This is pretty neat now. Also intelligent handling of frame numbers etc...
More tomorrow...
New features
[x] output types: An engine has output_types that you can add. These contain information about striding and selections (atom subsets). You can have an arbitrary set of these output_types. Usually you would have a master with full selection and some stride and a subset like
protein
with native stride[x]
Trajectory
objects now require aengine
property. I first thought to set this, when you actully run a trajectory, but it makes sense to set this upon creation. The engine contains information about topology and the output types so the trajectory is useless without this information. It also means that you can create the task directly from the trajectory. There are methods.run
and.extend
now for that.[x]
Engine
has now two commands.run()
and.extend()
instead of the long names for generating tasks. These are the same for all engines[x] pyemma feature support: This is tricky. I added a way to express pyemma features. What you do is convert the calls of
featurizer.add_[someting](arg1, arg2, ...)
into a dict like{'add_[something]': [arg1, arg2]}
where args can again be calls to the featurizer object. This will allow you basic featurizer construction. If you really need something fancy you have to write your own Analysis class.