PyEmma Analysis - Githubissues

thempel commented 7 years ago

For actual adaptive simulations, a flexible way to analyze the data is very important. Apart from choosing input parameters such as the lag time and msm states, it will be necessary to modify e.g. input features. Modifying msmanlyze.py, which is effectively part of the adaptivemd source code, is not very convenient. It might further be a bit risky to give the user the standard analysis for the alanine dipeptide because he must opt-out in order to avoid meaningless yet working results. For more complex types of trajectories, even more options need to be considered. I would suggest to either

have a script-based solution that allows the user to write custom scripts. Maybe adding its path to a modeller object. This would also allow to keep track of which modeller has been used to generate which trajectories.

or

a function-based solution: I don't know if this even works, but it might be even better to define custom functions for the analysis which must take a given set of input parameters and produce the output in a given shape. I am thinking of something similar to PyEmma's featurizer.add_custom_function(). It would allow to directly see which keyword arguments can be chosen. Further, the function could be stored in the database, I suppose, making it easy to keep track of the used strategy. Might also be easier to add this to the "brain"...

franknoe commented 7 years ago

I agree. I thought it's possibly to pass arbitrary code. The user should be able to access the full database information and just tell the framework which starting conditions should be selected next.

Am 15/03/17 um 15:10 schrieb thempel:

For actual adaptive simulations, a flexible way to analyze the data is very important. Apart from choosing input parameters such as the lag time and msm states, it will be necessary to modify e.g. input features. Modifying |msmanlyze.py|, which is effectively part of the adaptivemd source code, is not very convenient. It might further be a bit risky to give the user the standard analysis for the alanine dipeptide because he must opt-out in order to avoid meaningless yet working results. For more complex types of trajectories, even more options need to be considered. I would suggest to either

*
have a script-based solution that allows the user to write custom
scripts. Maybe adding its path to a |modeller| object. This would
also allow to keep track of which modeller has been used to
generate which trajectories.
*
a function-based solution: I don't know if this even works, but it
might be even better to define custom functions for the analysis
which must take a given set of input parameters and produce the
output in a given shape. I am thinking of something similar to
PyEmma's |featurizer.add_custom_function()|. It would allow to
directly see which keyword arguments can be chosen. Further, the
function could be stored in the database, I suppose, making it
easy to keep track of the used strategy. Might also be easier to
add this to the "brain"...
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/markovmodel/adaptivemd/issues/19, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQijwn7okLIUMhzlnBoQjZAWUQBE0ks5rl_FOgaJpZM4Md_ZS.

--

Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de Mail: Arnimallee 6, 14195 Berlin, Germany

nsplattner commented 7 years ago

For basic model building functionality required for adaptive sampling it would be sufficient to slightly extend the options of remote_analysis(). The minimal functionality includes the following options:

featurizer selection (e.g. 'add_all', 'add_backbone_torsion')
transformation (e.g. None or TICA)
TICA options (lag, kinetic variance or number of dimensions)
clustering method (k-means or regspace + metric, cutoff or number of clusters)
MSM lagtime

If these options can be passed most cases will be covered. For everything more complicated a custom function or additional script could be used.

franknoe commented 7 years ago

I think it's important to not just be able to select from a few options, but to be able to develop new strategies. For that we effectively need to be able to access the data. The form of the decision making can be standartized, such as returning a set of selected starting points.

Am 15/03/17 um 15:48 schrieb nsplattner:

For basic model building functionality required for adaptive sampling it would be sufficient to slightly extend the options of |remote_analysis()|. The minimal functionality includes the following options:

featurizer selection (e.g. 'add_all', 'add_backbone_torsion')

transformation (e.g. None or TICA)

TICA options (lag, kinetic variance or number of dimensions)

clustering method (k-means or regspace + metric, cutoff or number of clusters)

MSM lagtime

If these options can be passed most cases will be covered. For everything more complicated a custom function or additional script could be used.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/markovmodel/adaptivemd/issues/19#issuecomment-286764836, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQsN9K3t1R9ht5r5J3SD5qXCx513Xks5rl_oqgaJpZM4Md_ZS.

--

Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de Mail: Arnimallee 6, 14195 Berlin, Germany

jhprinz commented 7 years ago

featurizer selection (e.g. 'add_all', 'add_backbone_torsion')

This is in there now.

I think we should cover the usual suspects. If you want something really fancy you can always write your own analysis code. NP. But most people will want to use PyEmma in some standard ways like we teach in the courses (as @nsplattner listed). Features are in there now. TICA is always on but has some options. Clustering should be selectable, but so far we only have n_states. MSM lagtime is in there.

nsplattner commented 7 years ago

I'm not sure how the function remote_analysis() is supposed to work. The choice of features seems to be hardcoded (line 44, feat.add_backbone_torsions()) Is this supposed to be an example or customizable? How can arguments be passed to the featurizer? If its an example it should not be in the main code but rather in the tutorial directory.

jhprinz commented 7 years ago

This was an example where I hardcoded it. it should be obvious what to change. Unfortunately PyEMMA does not allow to store a feature description in some way. but the upcoming PR #28 will change that.

nsplattner commented 7 years ago

O.k., thanks for the details! It is obvious what to change, the problem is that a) its not clear that this is an example since its placed in the package and b) its not convenient to have a custom function placed in the package since its lost when the code is updated.

jhprinz commented 7 years ago

Sorry for the confusing. It was not planned originally to turn it into a package. I did that to make it easier for you guys. All the additional work including cleanups, documentation is kind of hard to do in 2 weeks time.

PR #28 and #35 will solve that problem and allow much more customization tough.

markovmodel / adaptivemd

PyEmma Analysis #19

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de Mail: Arnimallee 6, 14195 Berlin, Germany

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de Mail: Arnimallee 6, 14195 Berlin, Germany