Open ihodes opened 8 years ago
Ketrew's engine also has a database to store workflow metadata; I could imagine making this database available to workflows as scratch space for key/value storage. I could also imagine that going horribly wrong as workflows could hose the engine's storage and, in the setting of Ketrew being used as a shared server, everyone could get killed.
It's probably best to not include this functionality in the workflow engine and just allow users to decide how to handle state between nodes themselves. For our lab's purposes, you could set up a little Redis server or something if you don't like files. Can you link to the code for the specific use case that motivated this issue?
The trouble is that passing state is a rather clunky currently, though the newer API certainly makes it easier. For an example of storing the output of a call to Cycledash, see below. A "witness" file (a @smondet -ism) is used as the "product" of the workflow node, containing the HTTP response of the request made by the node's build process. This witness file can then be depended on (not shown, much more complicated; requires doing some shell-escpaing and cat
ing in the witness file) and the path of it used in subsequent nodes.
A Program.t
compatible Pipe
product could remove a lot of this boilerplate & dealing with shell-escaping, and make it clearer to a read what the pipeline is trying to do when passing output through nodes.
let post_bam_to_cycledash ~project_name ~bam_path ~edges ~cycledash_url =
let open KEDSL in
let open Biokepi.Run_environment in
let open Biokepi.Workflow_utilities in
let name = sprintf "POST BAM to Cycledash: %s" bam_path in
let witness_file = bam_path ^ ".cycledash-post-bam-witness" in
let rm_witness = Remove.file ~run_with:Demeter.machine witness_file in
let host = Demeter.host in
let make =
Machine.quick_command Demeter.machine Program.(
shf
{s|curl -f -H 'Content-Type: application/json' %s/api/bams -d '{"uri": "%s", "projectName": "%s"}' > %s |s}
cycledash_url bam_path project_name witness_file
)
in
workflow_node ~name ~edges:(edges @ [on_failure_activate rm_witness]) ~make (single_file ~host witness_file)
For example an ID or data returned from a web service needs to be passed to subsequent nodes for further processing. Right now the sanest way to do this is create a file with a known name and location and pass it around; this has a lot of moving parts for something that could be made simple.
I propose something like a
Pipe
module that creates names, persistent pipes between workflow nodes. This could be implemented as a named file, and would more easily be implemented using Biokepi's newer API that usesproducts
; a Pipe would have aproduct
method, as well as a read/write interface.