kassonlab / gmxapi

(outdated) fork of https://gitlab.com/gromacs/gromacs
http://gmxapi.org/
Other
52 stars 13 forks source link

Python wrapper for CLI programs #198

Open eirrgang opened 5 years ago

eirrgang commented 5 years ago

subtask of #190

gmx.command_line() produces gmx.Operation objects that can be used in a work graph to invoke subprocesses. gmx.map() generates appropriate graph topologies for e.g. ArrayOperation or ensemble simulation inputs. The user expresses CLI flags as a dictionary (collections.OrderedDict) of key-value pairs. User must express execution order with the usual work graph dependency annotation.

eirrgang commented 5 years ago

We have resolved to distinguish between a graph edge that is an ensemble or array of operations, versus data the is a sequence or array.

We can convert between the two (if necessary) with map and gather, borrowing common meanings of such terms. Helper functions can automatically broadcast data when input data types are known, but we can also use implicitly generated broadcast operations and allow for explicit broadcast helper functions. reduce will also fall into this set of data flow operations when it is explicitly represented at a higher level.

Example user syntax

Further along in #190, we would expect to have something like hbond = gmx.tool.hbond(...) (or gmxtool.hbond, gmx.tool('hbond')() or something), but in the simplest first round case, we wrap the command line, where hbond is the first argument to the command with the gmx executable.

In the simplest case:

hbond = gmx.commandline_operation('gmx',
    arguments=['hbond'],
    input={
        '-f': 'somefile.trr',
        '-s': 'input.tpr',
        '-n': 'index.ndx' 
    },
    output={
        '-num': 'bynum.xvg',
        '-ang': 'hbang.xvg'
    })

For change #200, I expect to use the same implicit scatter or map idea as previously with from_tpr(): an array or list value implicitly generates an array operation; to get the effect of broadcast, a list of identical items is used. This will be refined in #203

Design decisions

It seems inelegant that filename options could be placed in either keyword_arguments or input/output, and the idea of intelligently handling non-scalar values seems like unnecessary complexity. For this issue, #200 should remove keyword_arguments. Users can manually append elements to arguments to the same effect.

Implementation notes

For the above examples to work, we should specify that arguments are added to the command line immediately after the executable.