Open sjdv1982 opened 3 years ago
Mostly done. To be done for 0.7:
__version__
for modulesTODO: adapt the document to mention "which"
Bash and docker transformers have now been unified. In the document, remove the version for modules, may not be a good idea.
"image" is now renamed to "docker"
On the long term, with help from experts, this could grow into a formal standard for reproducible computing. For now, this is more a roadmap/description of the situation.
Scope: Transformations. On the long term, for transformations in any language. For now, just Python/IPython, because Seamless internally translates all transformations into Python/IPython transformations that integrate the foreign code (using cffi for compiled languages, popen for bash, IPython magics for cython/R, etc.).
Mechanisms
Data dependencies. Each celltype has a canonical serialization/deserialization, and the checksum (SHA3-256) on the canonical serialization is computed. Only the celltypes "cson" and "yaml" (and "python", see below) are different, as they have a distinct semantic checksum. CSON and YAML are first translated into JSON, and then their checksum is computed. This means that comments etc. can be added without retriggering computation. Status: solved.
Transformer code without code dependencies. Transformer code is simply another data dependency of the transformation. The code is supposed to contain a block of statements that eventually assign a variable, whose name (normally
result
) and celltype are defined in the__output__
property of the transformation. The code is evaluated by its semantic checksum, which means for (Python) code, the creation of an AST buffer usingast.parse
andast.dump
(*). Status: solved for 0.7.Transformer code with module dependencies The celltype of a dependency can also be "module". A module is a dict with the following properties:
type
: "interpreted" or "compiled". Compiled modules are discussed elsewhere.language
: "ipython" or "python". Other languages are discussed elsewhere.code
: For simple modules, the (I)Python code. For packages, a dict of "filename":"python code" entries.Status Simple modules and packages work.
Transformations with an environment The environment can be specified as an image, as a conda environment, and/or as a set of capabilities. For a transformer to be executed, only one of the three needs to be matched. In other words, a transformation can be executed because of an image match OR a conda match OR a capability match. Status Solved for 0.7.
Environment
The environment is a transformer property
__env__
that can have the following properties.image
,conda
,capabilities
,powers
.Image
A dict that contains at least
name
, which is the name of an image. This is in principle a Docker image, although Singularity may be used to actually execute the transformation.version
orchecksum
(but not both) may be added. In case ofchecksum
, this is a Docker digest, not a Seamless checksum. Status: Solved for 0.7.Conda
The same what goes in an
environment.yml
file, i.e. a list of channels and a list of dependencies. The channels are optional. The dependencies may contain version specifications. No need for aname
field. Seamless (or any other software that will execute the transformer) will interrogate Conda to check if the dependencies are installed. Seamless will refuse to install new packages, but other software may. Status: Solved for 0.7.Capabilities
Each Seamless instance may have a list of abstract capabilities registered. It can execute transformations that require (a subset of) those capabilities, and no others. Capabilities can be major or minor. Major capabilities are analogous to images: to satisfy major capability [A, B], you would normally have to create a merged Docker image of A and B. Minor capabilities are analogous to packages, but more abstract, as they are not necessarily limited to conda packages. With each release of Seamless, a concrete meaning of each capability is defined. Therefore, individual capabilities do not a version number, but all capabilities together refer to a Seamless release number. Status Solved for 0.7.
Powers
A transformation that requests a power must be granted that power. There is no other way to execute the code in the transformation, but the checksum can of course be substituted with equivalent code that does not require it. For now, the following powers are (planned to be) supported:
get_ipython
environment.image
is provided as argumentimage
to the transformer code.environment.conda is provided as argument
condato the transformer code. **Status**
ipythonand
dockerhave been solved for 0.7.
condais long-term.
singularity` may never arrive.(*) = A filename is provided to ast.parse, but this does not change the AST dump, only the error messages.