Allow output types - Githubissues

jhprinz commented 7 years ago

This implements all of the discussion in #23 .

You can now make arbitrary mixtures of selection/ stride for your engine and run and extend, etc.

Only point missing is that PyEmma will need either a reduced PDB or a selection string to work with not full atom sets (this is because my pyemma script computed backbone angles and without topology this is difficult)

This is pretty neat now. Also intelligent handling of frame numbers etc...

More tomorrow...

New features

[x] output types: An engine has output_types that you can add. These contain information about striding and selections (atom subsets). You can have an arbitrary set of these output_types. Usually you would have a master with full selection and some stride and a subset like protein with native stride
```
engine.add_output_type('master', 'master.dcd', stride=10)
engine.add_output_type('protein', 'protein.dcd', stride=1, selection='protein')
```
[x] Trajectory objects now require a engine property. I first thought to set this, when you actully run a trajectory, but it makes sense to set this upon creation. The engine contains information about topology and the output types so the trajectory is useless without this information. It also means that you can create the task directly from the trajectory. There are methods .run and .extend now for that.
```
task = project.new_trajectory(pdb_file, 100, engine).run()
```
[x] Engine has now two commands .run() and .extend() instead of the long names for generating tasks. These are the same for all engines
[x] pyemma feature support: This is tricky. I added a way to express pyemma features. What you do is convert the calls of featurizer.add_[someting](arg1, arg2, ...) into a dict like {'add_[something]': [arg1, arg2]} where args can again be calls to the featurizer object. This will allow you basic featurizer construction. If you really need something fancy you have to write your own Analysis class.

jhprinz commented 7 years ago

I realized that some PR closed #23. So let's continue here.

jhprinz commented 7 years ago

I still need to update the examples. But after that we are good to go and have all the features that we wanted.

franknoe commented 7 years ago

Excellent thank you! Let us fully focus on the docs now.

Sent from my T-Mobile 4G LTE Device

-------- Original message -------- From: Jan-Hendrik Prinz notifications@github.com Date: 3/22/17 5:43 AM (GMT-06:00) To: markovmodel/adaptivemd adaptivemd@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [markovmodel/adaptivemd] [WIP] Allow output types (#28)

I still need to update the examples. But after that we are good to go and have all the features that we wanted.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/markovmodel/adaptivemd","title":"markovmodel/adaptivemd","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/markovmodel/adaptivemd"}},"updates":{"snippets":[{"icon":"PERSON","message":"@jhprinz in #28: I still need to update the examples. But after that we are good to go and have all the features that we wanted."}],"action":{"name":"View Pull Request","url":"https://github.com/markovmodel/adaptivemd/pull/28#issuecomment-288360934"}}}

jhprinz commented 7 years ago

So, examples are up. Please have a look! @nsplattner @thempel @franknoe

I think this is much more powerful now. I will update the docs some more and see to make a decent webpage.

franknoe commented 7 years ago

Thank you, will do

Am 22/03/17 um 20:35 schrieb Jan-Hendrik Prinz:

So, examples are up. Please have a look! @nsplattner https://github.com/nsplattner @thempel https://github.com/thempel @franknoe https://github.com/franknoe

I think this is much more powerful now. I will update the docs some more and see to make a decent webpage.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/markovmodel/adaptivemd/pull/28#issuecomment-288514058, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQppWQ7jLjLyIIjymzcBjnivHq5ygks5roXfkgaJpZM4MkkoY.

--

Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de Mail: Arnimallee 6, 14195 Berlin, Germany

thempel commented 7 years ago

I just tested this PR as described in the tutorial updated in #34. The following happens when I add the engine to the project generators. Did I miss something?

>>> project.generators.add(engine)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-71-d101ab5a5a33> in <module>()
----> 1 project.generators.add(engine)

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/bundle.pyc in add(self, item)
    321         if self._set is not None and item not in self._set:
    322             logger.info('Added file of type `%s`' % item.__class__.__name__)
--> 323             self._set.save(item)
    324 
    325     @property

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/object.pyc in save(self, obj)
    703 
    704         try:
--> 705             self._save(obj)
    706             self.cache[uuid] = obj
    707 

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/object.pyc in _save(self, obj)
    488 
    489     def _save(self, obj):
--> 490         dct = self.storage.simplifier.to_simple_dict(obj)
    491         self._document.insert(dct)
    492         obj.__store__ = self

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in to_simple_dict(self, obj, base_type)
    524             '_cls': obj.__class__.__name__,
    525             '_obj_uuid': str(UUID(int=obj.__uuid__)),
--> 526             '_dict': self.simplify(obj.to_dict(), base_type),
    527             '_id': str(UUID(int=obj.__uuid__)),
    528             '_time': int(obj.__time__)}

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
    557                         '_store': store.name}
    558 
--> 559         return super(UUIDObjectJSON, self).simplify(obj, base_type)
    560 
    561     def build(self, obj):

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
    166             else:
    167                 result = {
--> 168                     key: self.simplify(o) for key, o in obj.iteritems()
    169                     if key not in self.excluded_keys
    170                 }

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in <dictcomp>((key, o))
    167                 result = {
    168                     key: self.simplify(o) for key, o in obj.iteritems()
--> 169                     if key not in self.excluded_keys
    170                 }
    171 

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
    557                         '_store': store.name}
    558 
--> 559         return super(UUIDObjectJSON, self).simplify(obj, base_type)
    560 
    561     def build(self, obj):

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
    145                 return None
    146         elif type(obj) is list:
--> 147             return [self.simplify(o, base_type) for o in obj]
    148         elif type(obj) is tuple:
    149             return {'_tuple': [self.simplify(o, base_type) for o in obj]}

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
    557                         '_store': store.name}
    558 
--> 559         return super(UUIDObjectJSON, self).simplify(obj, base_type)
    560 
    561     def build(self, obj):

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
    125                         '_cls': obj.__class__.__name__,
    126                         '_obj_uuid': str(UUID(int=obj.__uuid__)),
--> 127                         '_dict': self.simplify(obj.to_dict(), base_type)}
    128                 else:
    129                     return {

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
    557                         '_store': store.name}
    558 
--> 559         return super(UUIDObjectJSON, self).simplify(obj, base_type)
    560 
    561     def build(self, obj):

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
    166             else:
    167                 result = {
--> 168                     key: self.simplify(o) for key, o in obj.iteritems()
    169                     if key not in self.excluded_keys
    170                 }

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in <dictcomp>((key, o))
    167                 result = {
    168                     key: self.simplify(o) for key, o in obj.iteritems()
--> 169                     if key not in self.excluded_keys
    170                 }
    171 

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
    552                 if not obj._ignore:
    553                     store = self.storage._obj_store[obj.__class__]
--> 554                     store.save(obj)
    555                     return {
    556                         '_hex_uuid': hex(obj.__uuid__),

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/object.pyc in save(self, obj)
    703 
    704         try:
--> 705             self._save(obj)
    706             self.cache[uuid] = obj
    707 

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/object.pyc in _save(self, obj)
    488 
    489     def _save(self, obj):
--> 490         dct = self.storage.simplifier.to_simple_dict(obj)
    491         self._document.insert(dct)
    492         obj.__store__ = self

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in to_simple_dict(self, obj, base_type)
    524             '_cls': obj.__class__.__name__,
    525             '_obj_uuid': str(UUID(int=obj.__uuid__)),
--> 526             '_dict': self.simplify(obj.to_dict(), base_type),
    527             '_id': str(UUID(int=obj.__uuid__)),
    528             '_time': int(obj.__time__)}

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/file.pyc in to_dict(self)
    368     def to_dict(self):
    369         ret = super(File, self).to_dict()
--> 370         if self._file:
    371             ret['_file_'] = base64.b64encode(self._file)
    372 

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/syncvar.pyc in __get__(self, instance, owner)
     38             if instance.__store__ is not None:
     39                 idx = self._idx(instance)
---> 40                 value = self._update(instance.__store__, idx)
     41                 self.values[instance] = value
     42                 return value

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/syncvar.pyc in _update(self, store, idx)
     23         if store is not None:
     24             return store._document.find_one(
---> 25                 {'_id': idx}).get(self.name)
     26 
     27         return None

AttributeError: 'NoneType' object has no attribute 'get'

jhprinz commented 7 years ago

This looks like to did not delete the project before restarting. Some of the internals have changed. Try

Project.delete(proj_name)

If that does not help could you post your engine definition?

That the message says is that you try to access an attribute ofan object that is marked as stored but does not exist in the db. That could happen if you reuse a pdb file e.g. after deletion of the project.

thempel commented 7 years ago

Ahh, thanks, I tried this but probably mixed something up. Now this error is resolved, but followed by another one: DocumentTooLarge: BSON document too large (20205194 bytes) - the connected server supports BSON document sizes up to 16777216 bytes. I'm using the same system as before and never had problems to load it into the DB. Looks like it is loading everything twice.

engine.items()

[('pdb_file_stage', 'init_adaptive.pdb'),
 ('integrator_file', 'integrator.xml'),
 ('_executable_file', 'openmmrun.py'),
 ('system_file_stage', 'system.xml'),
 ('pdb_file', 'init_adaptive.pdb'),
 ('integrator_file_stage', 'integrator.xml'),
 ('_executable_file_stage', 'openmmrun.py'),
 ('system_file', 'system.xml')]

jhprinz commented 7 years ago

well, strange... let me see...

jhprinz commented 7 years ago

All files were stored 2 before. But only the ones without _stage have content. Could you check that?

for k, v in engine.items():
    print len(v._file) if v._file is not None else 0

This works fine for me. So, I suspect that there is something else getting large.

jhprinz commented 7 years ago

Can you compare the file sizes with the original files? Just to make sure there is no overhead?

thempel commented 7 years ago

This seems to work and also the files on disc show the same number of characters. They have a total size of 11.5 M on disc, so it should be fine.

>>> for k, v in engine.items():
>>>    print v.short, len(v._file) if v._file is not None else 0

staging:///init_adaptive.pdb 0
file://{}/integrator.xml 117
file://{}/openmmrun.py 8828
staging:///system.xml 0
file://{}/init_adaptive.pdb 2204265
staging:///integrator.xml 0
staging:///openmmrun.py 0
file://{}/system.xml 8659243

thempel commented 7 years ago

Just scrolled through the above files in my notebook, there content seems fine. Is their anything else being copied?

jhprinz commented 7 years ago

This is the question. When exactly did this error happen. I assume you ran the setup from top with PDB, system.xml, etc and then, when storing the engine you got the error? So, it cannot have been caused by some other files, right? There are no other files present.

jhprinz commented 7 years ago

Found the bug/storage inefficiency. The file is really stored twice, which is definitely not intended. Will issue a quick fix. Still we should make it use the new storage option

jhprinz commented 7 years ago

Wow, this was a real tough one. Involving that weakref.WeakKeyDictionary uses hashing which depends on the pymongodb _id which in my implementation is set after object creation st. Due to the change of the hash you cannot find the same object in the WeakKeyDict... I should give the next seminar on that one...

No idea, how I found this one. That was probably the most hidden error so far...

Still, unfortunately #35 contains the fix and also allowing to store arbitrary large files now.

Problem is that when I will merge #35. This one will be merged as well... So let's at least finish this discussion. Additions from #35 are additional features while this PR changes the general concept of trajectories...

franknoe commented 7 years ago

I like the description of this task very much. The only point that concerns me a little bit is the last point (how to featurize), because it creates a relatively hard dependency on PyEMMA and our current naming conventions. There are two issues with this: (1) If you always depend on PyEMMA, this makes the dependencies very heavy (e.g. you also depend on things like matplotlib which are clearly irrelevant for this package) and many dependencies also means there are many ways for the package to break down if dependencies change. (2) Although we don't have a concrete plan for that, it is not impossible that the look+feel of PyEMMA featurization will change at some point. I know there are some deficiencies with the current one.

To address that, please check where you actually depend on PyEMMA and if possible find a way to make that dependency optional to your package, i.e. if the user doesn't need a certain functionality (e.g. writes their own analysis class), it shouldn't automatically install PyEMMA. For the second point, since you basicly have to look up the PyEMMA API in order to write this pseudocode anyway, why not just use the PyEMMA function names directly (with the 'add_'). In any case this needs to be clearly documented, i.e. add a link to the PyEMMA featurizer in the present API docs.

Looking at the examples now...

jhprinz commented 7 years ago

merging this

markovmodel / adaptivemd

Allow output types #28

New features

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de Mail: Arnimallee 6, 14195 Berlin, Germany