icgc-argo / workflow-roadmap

Roadmap and management for genomic data processing
GNU Affero General Public License v3.0
1 stars 0 forks source link

POC: Define a Pipeline to test workflow-graph services #39

Open rosibaj opened 4 years ago

rosibaj commented 4 years ago

Since the Node/Edge Services are completed we need to test the viability of the pipeline connections. We wil hardcoded a pipeline and have it use all the services to verify that each piece is working as expected.

Expected Outcome

Publish a sequencing_experiment analysis

Test Data

Dev (same files and meta info as in qa just different analysis/study/donor/sample/specimen ids)

WXS Tumour BAM analysis: https://song.rdpc-dev.cancercollaboratory.org/studies/RDPCDEV-CA/analysis/aee88104-1d07-447d-a881-041d07647ddb

WXS Normal BAM analysis: https://song.rdpc-dev.cancercollaboratory.org/studies/RDPCDEV-CA/analysis/519f3db9-e34f-44eb-9f3d-b9e34f34eb62

Qa

WXS Tumour BAM analysis: https://song.rdpc-qa.cancercollaboratory.org/studies/TEST-PR/analysis/1d891219-5b8e-4bc3-8912-195b8ebbc3d7 alignment running duration: ~15mins

WXS Normal BAM analysis: https://song.rdpc-qa.cancercollaboratory.org/studies/TEST-PR/analysis/56847fc7-7457-4df2-847f-c77457edf2a4 alignment running duration: ~12mins

Sanger-WXS running duration: ~10mins

Note: Exploration during this POC

rosibaj commented 4 years ago

@lindaxiang let's connect on a test data set here.

lindaxiang commented 4 years ago

@rosibaj sure thing. let me find the proper test data set for you.

rosibaj commented 4 years ago

@lindaxiang Please indicate a:

lindaxiang commented 4 years ago
rosibaj commented 4 years ago

@rosibaj Need to get this test data into dev after namespace work - @yalturmes can you let me know when dev is back after the namespace/empty-dir work is done?

andricDu commented 4 years ago

Progress Update

@lepsalex @rosibaj My opinion is that there is just a little bit more refinement of the components needed before the solution works end to end. I've documented my progress and findings in this update.

Deployment

Ingest Node (RDPC-QA)

Deployed: https://github.com/icgc-argo/workflow-roadmap/issues/60

$ k get deployment ingest-node
NAME          READY   UP-TO-DATE   AVAILABLE   AGE
ingest-node   1/1     1            1           6d23h

Alignment Node (RDPC-QA)

Deployed: https://github.com/icgc-argo/workflow-roadmap/issues/61 Running workflows: https://github.com/icgc-argo/dna-seq-processing-wfs.git with revision: 1.5.1

$ k get deployment align-node 
NAME         READY   UP-TO-DATE   AVAILABLE   AGE
align-node   1/1     1            1           80m

Example of launched workflow: https://wes.rdpc-qa.cancercollaboratory.org/runs/wes-f02af600905540ed99a24cb228499f87

Sanger VC Node

No progress.

Development work required

Several issues were identified that are blockers on having the system working end to end.

Extra Notes

andricDu commented 4 years ago

I'm using the following: https://github.com/icgc-argo/argo-sandbox/tree/master/wfg as scratch space for my work with the deployment.

akachru-github commented 2 years ago

All work to support this is now in QA.