warelab / sciapps

SciApps: a cloud-based platform for reproducible bioinformatics workflows
https://www.sciapps.org
Apache License 2.0
2 stars 1 forks source link

Example workflow for developing the workflow script #29

Closed liyawang closed 8 years ago

liyawang commented 8 years ago

Here is a four steps workflow: GLM --> AdjustP --> XYPlot1 ....................................................|--> XYPlot2 (not dependent on AdjustP)

Output folders: GLM: http://data.sciapps.org/results/glm-tassel-5-1-23-0fJGkCBl8B/ AdjustP: http://data.sciapps.org/results/adjustpvalue-0-0-1-Tx0ryhviOK/ XYPlot1: http://data.sciapps.org/results/xyplot-0-0-2-wyGcfBpaoL/ XYPlot2: http://data.sciapps.org/results/xyplot-0-0-2-O9ULw4qog4/

One thing we ignored before is that Agave might fail for whatever reasons. So the workflow engine will need to check outputs (possible to do?) and re-submit the job if failed.

liyawang commented 8 years ago

Workflow detail implementations

  1. Implement re-launch buttons in the right panel that allows bringing up the app interface with pre-set inputs and parameters (except user uploaded files? need policy on user data management)
  2. The buildWorkflow page will loop though the right panel and re-launch all 'completed' jobs one-by-one
  3. Allow that some steps can be deleted even if 'completed' (not archived, no outputs, needs adjustment)
  4. Step 3 can also be done if allow the user to manually drag the 're-launch' button one-by-one to build the workflow
  5. Once done, user can submit the workflow or download the workflow (a json file)
  6. For running the workflow, the dependency file will be named as step1:GLMstat.txt (assuming filename never change among runs, and files need to be renamed so not mess up with each step. Might need to use step1 instead step1: ) Later, will try to combine with Makeflow to utilize workflow engines
  7. For now, we will loop through step by step to submit the job. Later, will try to combine with Makeflow to utilize workflow engines
  8. For now, we don't have visualization. Later D3.js will be used to bring up the workflow visualization (node and links) and show the progress by the changing the color of the node. User can also save the visualization.
  9. The workflow json will be converted from the agave err file by taking the first 50%, truncating _links section, and adding 'step:1,2,3' etc. before the inputs section. Might need total steps?
  10. User can download the workflow json and upload it to populate the workflow page to run (can be developed later but it is a nice way to share)
liyawang commented 8 years ago

An example Galaxy workflow for RNA-seq diff analysis: Link: http://www.myexperiment.org/workflows/4126.html

liyawang commented 8 years ago

Divide developments into three parts: 1.Running the workflow in the backend Run each app once and results are archived to brie, inputs-outputs connections are built by "dragging". And the archived path, app info, parameters, inputs are saved to the job's json file (.err). Examples that run very fast with a few apps:

a.The script needs to read each job json for job submission (constructing a workflow json?) b.The script needs to decide when a job can be submitted if dependency is clear (every 5 minutes?) c.The script needs to perform job submission in a loop without archiving intermediate results d.The script needs to replace 'archieved path' with 'Agave path' on brie7 for inputs e.The script needs to rename outputs from each step (step1.a.txt, step2.a.txt) and move them to the folder of last job before archiving all outputs of the workflow back to brie (delete all previous folders? what if failed in the middle? How about repeated used inputs? Will enforce order of apps in a workflow simplify coding)

2.Example workflow Bring up an interface to display default parameters for all apps used in the workflow one by one. allow user to modify inputs and parameters before submitting the workflow. For chained inputs, user can not modify them and they will be displayed as 'step1.maker.gff.gz', etc.

3.Building a workflow An interface to add steps from history a.Display 5 steps window in the center panel but workflow can be constructed with less than 5 steps b.User can add more steps (fields) to the workflow form c.User can drag a job from the history panel to the workflow field (inputs-outputs matching is done automatically by 'dragging') d.User can construct the workflow, download it, upload it, and run it (a workflow json? need user to log in?)

liyawang commented 8 years ago

Workflow ideas (Change output link to): http://data.sciapps.org/results/test/readme.txt?jobid=n

jobid start from 1, 2, … (on the right column) appid,inutid1, inputid2, that generated the output can be retrieved from the jobid outputs are unique with jobid and output filename assuming output names are always the same no matter how inputs are changed for any app a workflow page will be built for constructing an automatic workflow all steps have been running at least once can delete nodes (failed analysis repeated later, no emergent) can save can run (archiving to brie so not need to modify agave job submission) can bring up used inputs and parameters for modification (parameter sweep)