westpa / west_tools

Supporting analysis tools for WESTPA (legacy; now merged into westpa)
2 stars 2 forks source link

Interactive, online analysis tool (devname: w_ipython) #18

Open astatide opened 8 years ago

astatide commented 8 years ago

Hi all,

Inside of the DEVELOPMENT branch of west_tools, I've started work on an interactive tool to ease interactive and automated analysis; the current name is w_ipython. I'm totally up for a different name.

The idea stemmed from the fact that we routinely needed to access the raw h5 data, either to debug a simulation or analyse it in a way that the tools don't currently (and probably won't ever) support. It would have been nice, I figured, to have a script that would just load up the main h5 file (typically west.h5) and avoid having to do import numpy, h5py, load up the iterations, etc, and maybe throw in a few convenience functions.

It sort of grew from there. It currently looks through the main configuration file (west.cfg), pulls in analysis parameters, runs functions that it needs to, and drops you at an ipython prompt with a 'w' object that contains all the information from your simulation and the analysis you've selected to do.

The initial and current development goals, as well as their implementation, are as follows.

Some issues that would need to be ironed out before release:

  1. The default parameters are set up as I found them convenient (which is cumulative evolution with a step size of 1). This is probably more or less fine, but it means I hardcoded a few 'this is always going to be an evolution plot with a step size of 1' bits in here and there that I'd need to change. It'll probably work with a bit of tweaking, but I should do said tweaking.
  2. It can currently only construct rectilinear bin mappers, with code blatantly ripped from w_assign. I should just call this function from w_assign to un-duplicate the code, or just think of a better way to do it altogether.
  3. MORE DOCUMENTATION.

It's calling functionality from other code whenever it can, for the most part, so it should be easy enough to maintain.

A few screenshots or configuration options, for the unbelievers:

Inside my west.cfg:

  w_ipython:
    directory: ANALYSIS
    postanalysis: True
    w_kinavg:
      bootstrap: True
    analysis_schemes:
      BOUND:
        enabled: True
        states:
          - label: unbound
            coords: [[10.0]]
          - label: bound
            coords: [[3.99]]
        bins:
          - type: RectilinearBinMapper
            boundaries: [[0.0,4.0,10.00,100000]]
      NOCORREL:
        enabled: True
        w_kinavg:
          bootstrap: True
          correl: False
        states:
          - label: unbound
            coords: [[10.0]]
          - label: bound
            coords: [[3.99]]
        bins:
          - type: RectilinearBinMapper
            boundaries: [[0.0,4.0,10.00,100000]]
      PROB:
        enabled: True
        w_kinavg:
          bootstrap: True
          correl: False
        states:
          - label: unbound
            coords: [[10.0]]
          - label: bound
            coords: [[3.99]]
        bins:
          - type: RectilinearBinMapper
            boundaries: [[0.0,4.0,100000]]

Startup, selecting iteration, and what's available in the current iteration: screenshot from 2016-10-18 17-03-57

Plotting from state 0 to 1 from the reweighting code: screenshot from 2016-10-18 17-05-52

Output from a trace. Easily plotted with pyplot, if one chose to do so:

screenshot from 2016-10-18 17-12-03

Comments, suggestions, criticisms, design suggestions, usability concerns, etc, are all appreciated. It's worth noting that all the tools have been updated such that they can run according to a particular 'analysis scheme' (in addition to their normal functionality), as well, so that it should be easy to integrate into an existing workflow. One can also call the 'analyze only' flag, as well, to just run everything and call it a day.

Adam

astatide commented 8 years ago

An output of the help, to give you an idea of the sort of information it exposes:

screenshot from 2016-10-18 17-24-53

synapticarbors commented 8 years ago

@ajoshpratt It's an interesting idea and certainly with newer versions of IPython (>5.0) that have multi-line editing, it could be helpful. I know, however, that I tend to do most of my analysis in a Jupyter notebook when possible. This usually involved moving a copy of the data (or some relevant intermediate result) to my local machine, so I can see the advantage of having something that can be run remotely from the command line. What I never explored, that might be relevant is being able to run a remote jupyter notebook kernel and then attach a local browser to it so you get the best of both worlds:

http://jupyter-notebook.readthedocs.io/en/latest/public_server.html

Again, I've never done this, so there might be some major limitations, but maybe it's worth looking at so users could potentially leverage all of the niceties of the notebook and also have full-fledged plotting capabilities.

Also, I wanted to note from a workflow standpoint that I'd discourage you from having a generic DEVELOPMENT branch that all development goes into. Instead each feature should have it's own branch that comes off of a common development branch (or possibly master directly). This development branch should always strive to be fully deployable and the goal is to merge it into master when it is time to spin off a release. When a feature branch has been discussed and approved, then it gets merged in.

But more generally, I think the WESTPA team should have a well-defined workflow for adding features. Other big projects spell them out in the docs:

http://scikit-learn.org/stable/developers/contributing.html http://msmbuilder.org/3.6.0/contributing.html etc.

I know this is diverging from the main topic of the issue, but to keep the long term maintainability of the code, I think it behoves us to have a well-defined process that includes automated test running and pull requests.

astatide commented 8 years ago

@synapticarbors, thanks for the workflow suggestion. I agree; we don't have a well defined workflow, so it's easy to stumble into a development situation where changes and fixes end up getting built on top of each other without getting merged.

For what it's worth, I'd been thinking about breaking development of this off into another branch to keep this one focused on changes to the kinetics code, but hadn't decided if it was worth it. Development sins aside, though, I wanted feedback before making any more changes. I'll be opening another topic on the kinetics changes soon, once I can work through the writeup and document why the changes are necessary (as well as cleaning the code).

Anyway, the suggestion about leveraging Jupyter is worth looking into; the information about other people's workflows is also nice to hear. We could consider creating 'easy to import' modules (and sample notebooks) that could work with a Jupyter notebook, but greatly simplify analysis for new users (exposing the same sort of data we're doing here). Actually, that's probably pretty straightforward with this tool; when the object is created, it does all the work necessary to prepare the various datasets. I suppose you'd really only have to:

  1. have the data locally available, and
  2. instantiate the object.

Which is something I hadn't really thought of before. There are some 'convenience' plotting functions that take advantage of matplotlib that would already work reasonably well, here.

Adam

ltchong commented 8 years ago

Hi Josh,

Thanks for bringing up these ideas of yours again about the workflow -- they have been on our list of things to do. It is very useful to see how projects like MSMBuilder that have been around for much longer than WESTPA have evolved in terms of handling workflow, etc. and we will keep it in mind. We still have a very small group of developers, so it will take some time to get everything in place.

Best, Lillian

On Tue, Oct 18, 2016 at 7:55 PM, Joshua Adelman notifications@github.com wrote:

@ajoshpratt https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fajoshpratt&data=01%7C01%7Cltchong%40pitt.edu%7C84aa14a2b13d45c1f9aa08d3f7b238bf%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1&sdata=efvqSgP7q76dFYE%2B9N5OJhQ6CFSqRzDti3LxBoPfYgA%3D&reserved=0 It's an interesting idea and certainly with newer versions of IPython (>5.0) that have multi-line editing, it could be helpful. I know, however, that I tend to do most of my analysis in a Jupyter notebook when possible. This usually involved moving a copy of the data (or some relevant intermediate result) to my local machine, so I can see the advantage of having something that can be run remotely from the command line. What I never explored, that might be relevant is being able to run a remote jupyter notebook kernel and then attach a local browser to it so you get the best of both worlds:

http://jupyter-notebook.readthedocs.io/en/latest/public_server.html https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fjupyter-notebook.readthedocs.io%2Fen%2Flatest%2Fpublic_server.html&data=01%7C01%7Cltchong%40pitt.edu%7C84aa14a2b13d45c1f9aa08d3f7b238bf%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1&sdata=QsmtbKOtTjwXBn4b379Wv0nOT3qnvG5e4pmt5AF%2BFx0%3D&reserved=0

Again, I've never done this, so there might be some major limitations, but maybe it's worth looking at so users could potentially leverage all of the niceties of the notebook and also have full-fledged plotting capabilities.

Also, I wanted to note from a workflow standpoint that I'd discourage you from having a generic DEVELOPMENT branch that all development goes into. Instead each feature should have it's own branch that comes off of a common development branch (or possibly master directly). This development branch should always strive to be fully deployable and the goal is to merge it into master when it is time to spin off a release. When a feature branch has been discussed and approved, then it gets merged in.

But more generally, I think the WESTPA team should have a well-defined workflow for adding features. Other big projects spell them out in the docs:

http://scikit-learn.org/stable/developers/contributing.html https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fscikit-learn.org%2Fstable%2Fdevelopers%2Fcontributing.html&data=01%7C01%7Cltchong%40pitt.edu%7C84aa14a2b13d45c1f9aa08d3f7b238bf%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1&sdata=djhkN5ibgs0pyz9sPrS8kPsyDyWLwkPvE7wMtULAVkI%3D&reserved=0 http://msmbuilder.org/3.6.0/contributing.html https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmsmbuilder.org%2F3.6.0%2Fcontributing.html&data=01%7C01%7Cltchong%40pitt.edu%7C84aa14a2b13d45c1f9aa08d3f7b238bf%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1&sdata=L2MB4NGPzWkM8OG5TVq5Nw1FcUDnJxqEF%2BfNKIAJ99k%3D&reserved=0 etc.

I know this is diverging from the main topic of the issue, but to keep the long term maintainability of the code, I think it behoves us to have a well-defined process that includes automated test running and pull requests.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fwestpa%2Fwest_tools%2Fissues%2F18%23issuecomment-254672677&data=01%7C01%7Cltchong%40pitt.edu%7C84aa14a2b13d45c1f9aa08d3f7b238bf%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1&sdata=qhSTdG5MNrDeMOx217C8lcS4v0TFCfvsRu5Xw9t%2BRP4%3D&reserved=0, or mute the thread https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAIiaoVdGQvO8BfITBlC8wb5dxskWR9xpks5q1VxogaJpZM4KaTIu&data=01%7C01%7Cltchong%40pitt.edu%7C84aa14a2b13d45c1f9aa08d3f7b238bf%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1&sdata=ji29O%2Bbv3V7qRl7O5CJMj0rexWQclgt3qnDpT6cQc9k%3D&reserved=0 .

Lillian T. Chong Associate Professor Department of Chemistry University of Pittsburgh 219 Parkman Avenue Pittsburgh, PA 15260 (412) 624-6026

astatide commented 7 years ago

Looking around, it looks like it's a little difficult (if not impossible) to cleanly launch an interface-agnostic ipython notebook interface from a script*, and impossible to call an IPython notebook that interfaces to a running kernel.

There may be a magic command, but it's probably much easier to simplify modify the west script in $WEST_ROOT/bin to accept a '--notebook' command which launches a Jupyter notebook. The user could then create a notebook and import the module (we could provide examples of how to do this) and run with the convenience functions in w_ipython, if they wanted.

On the user end, this takes care of all the variable setting that is required to launch a WESTPA script. On our end, it's not that difficult, either. The WEST script already accepts flags (strace, etc) that aren't sent on to the python binary, so the framework is there, so to speak.

image

Seems to work well enough. The user could then launch

w_ipython --notebook

to launch Jupyter notebook, or just

w_ipython

To drop them into an interactive prompt.

Still thinking of a good name for this. Also, you can tell that I started from w_kinavg as a base for this, given that it's still named Kinetics. Hah.