Closed liyawang closed 8 years ago
Workflow detail implementations
An example Galaxy workflow for RNA-seq diff analysis: Link: http://www.myexperiment.org/workflows/4126.html
Divide developments into three parts: 1.Running the workflow in the backend Run each app once and results are archived to brie, inputs-outputs connections are built by "dragging". And the archived path, app info, parameters, inputs are saved to the job's json file (.err). Examples that run very fast with a few apps:
a.The script needs to read each job json for job submission (constructing a workflow json?) b.The script needs to decide when a job can be submitted if dependency is clear (every 5 minutes?) c.The script needs to perform job submission in a loop without archiving intermediate results d.The script needs to replace 'archieved path' with 'Agave path' on brie7 for inputs e.The script needs to rename outputs from each step (step1.a.txt, step2.a.txt) and move them to the folder of last job before archiving all outputs of the workflow back to brie (delete all previous folders? what if failed in the middle? How about repeated used inputs? Will enforce order of apps in a workflow simplify coding)
2.Example workflow Bring up an interface to display default parameters for all apps used in the workflow one by one. allow user to modify inputs and parameters before submitting the workflow. For chained inputs, user can not modify them and they will be displayed as 'step1.maker.gff.gz', etc.
3.Building a workflow An interface to add steps from history a.Display 5 steps window in the center panel but workflow can be constructed with less than 5 steps b.User can add more steps (fields) to the workflow form c.User can drag a job from the history panel to the workflow field (inputs-outputs matching is done automatically by 'dragging') d.User can construct the workflow, download it, upload it, and run it (a workflow json? need user to log in?)
Workflow ideas (Change output link to): http://data.sciapps.org/results/test/readme.txt?jobid=n
jobid start from 1, 2, … (on the right column) appid,inutid1, inputid2, that generated the output can be retrieved from the jobid outputs are unique with jobid and output filename assuming output names are always the same no matter how inputs are changed for any app a workflow page will be built for constructing an automatic workflow all steps have been running at least once can delete nodes (failed analysis repeated later, no emergent) can save can run (archiving to brie so not need to modify agave job submission) can bring up used inputs and parameters for modification (parameter sweep)
Here is a four steps workflow: GLM --> AdjustP --> XYPlot1 ....................................................|--> XYPlot2 (not dependent on AdjustP)
Output folders: GLM: http://data.sciapps.org/results/glm-tassel-5-1-23-0fJGkCBl8B/ AdjustP: http://data.sciapps.org/results/adjustpvalue-0-0-1-Tx0ryhviOK/ XYPlot1: http://data.sciapps.org/results/xyplot-0-0-2-wyGcfBpaoL/ XYPlot2: http://data.sciapps.org/results/xyplot-0-0-2-O9ULw4qog4/
One thing we ignored before is that Agave might fail for whatever reasons. So the workflow engine will need to check outputs (possible to do?) and re-submit the job if failed.