populationgenomics / analysis-runner

MIT License
2 stars 4 forks source link

Parse cromwell workflow metadata script #690

Closed EddieLF closed 4 months ago

EddieLF commented 4 months ago

This PR adds a script to rescue the partially successful results from running the GATK SV single sample workflow. When running this workflow for a genome, frequently all sub-workflows except Manta succceed. And if Manta doesn't succeed, none of the files from any of the successful sub-workflows are copied to the main bucket, nor are analyses logged in Metamist.

The script:

Since only the service account can access the cromwell API token via the cpg_utils.cromwell.get_cromwell_oauth_token, I have been testing with a workflow metadata json that I downloaded from the cromwell API endpoint swagger page.

With a local JSON from one of the failed workflows, here are the results with --dry-run

Workflow Status for ID ('test_workflow_2',):
  Dataset: my-dataset, Sequencing Group ID: CPGxxxxxx
  Scramble: Done
  LocalizeReads: Done
  CollectCounts: Done
  Manta: Running
  Whamg: Done
  CollectSVEvidence: Done
6 outputs found:
  Scramble:
    vcf: gs://cpg-seqr-main-tmp/cromwell/GatherSampleEvidence/xxx-xxx-xxx-xxx-xxx/call-Scramble/Scramble/yyy-yyy-yyy-yyy-yyy/call-ScramblePart2/CPGxxxxxx.scramble.vcf.gz
    index: gs://cpg-seqr-main-tmp/cromwell/GatherSampleEvidence/xxx-xxx-xxx-xxx-xxx/call-Scramble/Scramble/yyy-yyy-yyy-yyy-yyy/call-ScramblePart2/CPGxxxxxx.scramble.vcf.gz.tbi
  LocalizeReads:
  CollectCounts:
    counts: gs://cpg-seqr-main-tmp/cromwell/GatherSampleEvidence/xxx-xxx-xxx-xxx-xxx/call-CollectCounts/CPGxxxxxx.counts.tsv.gz
  Manta:
  Whamg:
    vcf: gs://cpg-seqr-main-tmp/cromwell/GatherSampleEvidence/xxx-xxx-xxx-xxx-xxx/call-Whamg/Whamg/yyy-yyy-yyy-yyy-yyy/call-RunWhamgOnCram/CPGxxxxxx.wham.vcf.gz
    index: gs://cpg-seqr-main-tmp/cromwell/GatherSampleEvidence/xxx-xxx-xxx-xxx-xxx/call-Whamg/Whamg/yyy-yyy-yyy-yyy-yyy/call-RunWhamgOnCram/CPGxxxxxx.wham.vcf.gz.tbi
  CollectSVEvidence:
    split_out_index: gs://cpg-seqr-main-tmp/cromwell/GatherSampleEvidence/xxx-xxx-xxx-xxx-xxx/call-CollectSVEvidence/CollectSVEvidence/yyy-yyy-yyy-yyy-yyy/call-RunCollectSVEvidence/CPGxxxxxx.sr.txt.gz.tbi
    sd_out: gs://cpg-seqr-main-tmp/cromwell/GatherSampleEvidence/xxx-xxx-xxx-xxx-xxx/call-CollectSVEvidence/CollectSVEvidence/yyy-yyy-yyy-yyy-yyy/call-RunCollectSVEvidence/CPGxxxxxx.sd.txt.gz
    disc_out: gs://cpg-seqr-main-tmp/cromwell/GatherSampleEvidence/xxx-xxx-xxx-xxx-xxx/call-CollectSVEvidence/CollectSVEvidence/yyy-yyy-yyy-yyy-yyy/call-RunCollectSVEvidence/CPGxxxxxx.pe.txt.gz
    split_out: gs://cpg-seqr-main-tmp/cromwell/GatherSampleEvidence/xxx-xxx-xxx-xxx-xxx/call-CollectSVEvidence/CollectSVEvidence/yyy-yyy-yyy-yyy-yyy/call-RunCollectSVEvidence/CPGxxxxxx.sr.txt.gz
    disc_out_index: gs://cpg-seqr-main-tmp/cromwell/GatherSampleEvidence/xxx-xxx-xxx-xxx-xxx/call-CollectSVEvidence/CollectSVEvidence/yyy-yyy-yyy-yyy-yyy/call-RunCollectSVEvidence/CPGxxxxxx.pe.txt.gz.tbi
    sd_out_index: gs://cpg-seqr-main-tmp/cromwell/GatherSampleEvidence/xxx-xxx-xxx-xxx-xxx/call-CollectSVEvidence/CollectSVEvidence/yyy-yyy-yyy-yyy-yyy/call-RunCollectSVEvidence/CPGxxxxxx.sd.txt.gz.tbi
DRY RUN: Would have copied gs://cpg-seqr-main-tmp/cromwell/GatherSampleEvidence/xxx-xxx-xxx-xxx-xxx/call-Scramble/Scramble/yyy-yyy-yyy-yyy-yyy/call-ScramblePart2/CPGxxxxxx.scramble.vcf.gz to gs://cpg-my-dataset-main/sv_evidence/CPGxxxxxx.scramble.vcf.gz
DRY RUN: Would have copied gs://cpg-seqr-main-tmp/cromwell/GatherSampleEvidence/xxx-xxx-xxx-xxx-xxx/call-Scramble/Scramble/yyy-yyy-yyy-yyy-yyy/call-ScramblePart2/CPGxxxxxx.scramble.vcf.gz.tbi to gs://cpg-my-dataset-main/sv_evidence/CPGxxxxxx.scramble.vcf.gz.tbi
DRY RUN: Would have copied gs://cpg-seqr-main-tmp/cromwell/GatherSampleEvidence/xxx-xxx-xxx-xxx-xxx/call-CollectCounts/CPGxxxxxx.counts.tsv.gz to gs://cpg-my-dataset-main/sv_evidence/CPGxxxxxx.counts.tsv.gz
DRY RUN: Would have copied gs://cpg-seqr-main-tmp/cromwell/GatherSampleEvidence/xxx-xxx-xxx-xxx-xxx/call-Whamg/Whamg/yyy-yyy-yyy-yyy-yyy/call-RunWhamgOnCram/CPGxxxxxx.wham.vcf.gz to gs://cpg-my-dataset-main/sv_evidence/CPGxxxxxx.wham.vcf.gz
DRY RUN: Would have copied gs://cpg-seqr-main-tmp/cromwell/GatherSampleEvidence/xxx-xxx-xxx-xxx-xxx/call-Whamg/Whamg/yyy-yyy-yyy-yyy-yyy/call-RunWhamgOnCram/CPGxxxxxx.wham.vcf.gz.tbi to gs://cpg-my-dataset-main/sv_evidence/CPGxxxxxx.wham.vcf.gz.tbi
DRY RUN: Would have copied gs://cpg-seqr-main-tmp/cromwell/GatherSampleEvidence/xxx-xxx-xxx-xxx-xxx/call-CollectSVEvidence/CollectSVEvidence/yyy-yyy-yyy-yyy-yyy/call-RunCollectSVEvidence/CPGxxxxxx.sr.txt.gz.tbi to gs://cpg-my-dataset-main/sv_evidence/CPGxxxxxx.sr.txt.gz.tbi
DRY RUN: Would have copied gs://cpg-seqr-main-tmp/cromwell/GatherSampleEvidence/xxx-xxx-xxx-xxx-xxx/call-CollectSVEvidence/CollectSVEvidence/yyy-yyy-yyy-yyy-yyy/call-RunCollectSVEvidence/CPGxxxxxx.sd.txt.gz to gs://cpg-my-dataset-main/sv_evidence/CPGxxxxxx.sd.txt.gz
DRY RUN: Would have copied gs://cpg-seqr-main-tmp/cromwell/GatherSampleEvidence/xxx-xxx-xxx-xxx-xxx/call-CollectSVEvidence/CollectSVEvidence/yyy-yyy-yyy-yyy-yyy/call-RunCollectSVEvidence/CPGxxxxxx.pe.txt.gz to gs://cpg-my-dataset-main/sv_evidence/CPGxxxxxx.pe.txt.gz
DRY RUN: Would have copied gs://cpg-seqr-main-tmp/cromwell/GatherSampleEvidence/xxx-xxx-xxx-xxx-xxx/call-CollectSVEvidence/CollectSVEvidence/yyy-yyy-yyy-yyy-yyy/call-RunCollectSVEvidence/CPGxxxxxx.sr.txt.gz to gs://cpg-my-dataset-main/sv_evidence/CPGxxxxxx.sr.txt.gz
DRY RUN: Would have copied gs://cpg-seqr-main-tmp/cromwell/GatherSampleEvidence/xxx-xxx-xxx-xxx-xxx/call-CollectSVEvidence/CollectSVEvidence/yyy-yyy-yyy-yyy-yyy/call-RunCollectSVEvidence/CPGxxxxxx.pe.txt.gz.tbi to gs://cpg-my-dataset-main/sv_evidence/CPGxxxxxx.pe.txt.gz.tbi
DRY RUN: Would have copied gs://cpg-seqr-main-tmp/cromwell/GatherSampleEvidence/xxx-xxx-xxx-xxx-xxx/call-CollectSVEvidence/CollectSVEvidence/yyy-yyy-yyy-yyy-yyy/call-RunCollectSVEvidence/CPGxxxxxx.sd.txt.gz.tbi to gs://cpg-my-dataset-main/sv_evidence/CPGxxxxxx.sd.txt.gz.tbi
No manta outputs found for CPGxxxxxx.
Dataset: my-dataset
Sequencing Group ID: CPGxxxxxx, Would create: 2 SV analyses

In this case, the scramble and whamg sub-workflows succeeded, as did the CollectCounts and CollectSVEvidence sub-workflows. So, we copy all these files across to the datasets main bucket into the sv_evidence/ prefix, and we create two SV analyses, one for the scramble result and one for the whamg result.