umccr / cwl-ica

A collection of cwl-ica workflows along with a user guide for the commands to use and contributions guide
MIT License
8 stars 2 forks source link

Scrape ICA information to find any other workflows in development_workflows or production_workflows that are not in project.yaml #149

Closed alexiswl closed 1 year ago

alexiswl commented 1 year ago

Incident occurred with https://github.com/umccr/cwl-ica/pull/148 where project.yaml was incomplete with umccrise tool existing in production_workflows but not in project.yaml, this meant

$ cwl-ica add-tool-to-project \
  --tool-path tools/umccrise/2.2.1--0/umccrise__2.2.1--0.cwl \
  --project "production_workflows"

Failed with

2022-10-19 22:03:49,281 - INFO     - cwl_tool                  - validate_object                          : LineNo. 128  - Running cwltool --validate "/Users/pdiakumis/projects/cwl-ica/tools/umccrise/2.2.1--0/umccrise__2.2.1--0.cwl"
2022-10-19 22:03:52,598 - INFO     - cwl_tool                  - validate_object                          : LineNo. 138  - Generating packed file
2022-10-19 22:03:54,950 - INFO     - cwl_tool                  - validate_object                          : LineNo. 142  - Running cwltool --validate on packed file
2022-10-19 22:03:57,924 - INFO     - cwl_tool                  - validate_object                          : LineNo. 149  - Collecting md5sum from packed file
2022-10-19 22:03:57,925 - INFO     - add_to_project            - __call__                                 : LineNo. 107  - Adding tool "umccrise/2.2.1--0" to project "production_workflows"
2022-10-19 22:03:58,098 - ERROR    - ica_workflow              - create_workflow_id                       : LineNo. 136  - Api exeception error when trying to create a workflow for umccrise
Traceback (most recent call last):
  File "/Users/pdiakumis/conda/envs/cwl-ica/lib/python3.8/classes/ica_workflow.py", line 134, in create_workflow_id
    api_response = api_instance.create_workflow(body=body)
  File "/Users/pdiakumis/conda/envs/cwl-ica/lib/python3.8/site-packages/libica/openapi/libwes/api/workflows_api.py", line 62, in create_workflow
    return self.create_workflow_with_http_info(**kwargs)  # noqa: E501
  File "/Users/pdiakumis/conda/envs/cwl-ica/lib/python3.8/site-packages/libica/openapi/libwes/api/workflows_api.py", line 137, in create_workflow_with_http_info
    return self.api_client.call_api(
  File "/Users/pdiakumis/conda/envs/cwl-ica/lib/python3.8/site-packages/libica/openapi/libwes/api_client.py", line 364, in call_api
    return self.__call_api(resource_path, method,
  File "/Users/pdiakumis/conda/envs/cwl-ica/lib/python3.8/site-packages/libica/openapi/libwes/api_client.py", line 188, in __call_api
    raise e
  File "/Users/pdiakumis/conda/envs/cwl-ica/lib/python3.8/site-packages/libica/openapi/libwes/api_client.py", line 181, in __call_api
    response_data = self.request(
  File "/Users/pdiakumis/conda/envs/cwl-ica/lib/python3.8/site-packages/libica/openapi/libwes/api_client.py", line 407, in request
    return self.rest_client.POST(url,
  File "/Users/pdiakumis/conda/envs/cwl-ica/lib/python3.8/site-packages/libica/openapi/libwes/rest.py", line 265, in POST
    return self.request("POST", url,
  File "/Users/pdiakumis/conda/envs/cwl-ica/lib/python3.8/site-packages/libica/openapi/libwes/rest.py", line 224, in request
    raise ApiException(http_resp=r)
libica.openapi.libwes.exceptions.ApiException: (409)
Reason: Conflict
HTTP response headers: HTTPHeaderDict({'Date': 'Wed, 19 Oct 2022 11:03:58 GMT', 'Content-Type': 'application/json', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'X-Frame-Options': 'deny', 'X-Content-Type-Options': 'nosniff', 'X-XSS-Protection': '1; mode=block', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Methods': 'GET, PUT, POST, DELETE, PATCH, OPTIONS', 'Access-Control-Allow-Headers': 'DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Authorization'})
HTTP response body: {"code":"Workflows.Workflow.WorkflowAlreadyExists","message":"A workflow with the given name 'umccrise_prod-wf' already exists."}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/pdiakumis/conda/envs/cwl-ica/bin/cwl-ica", line 13, in <module>
    sys.exit(main())
  File "/Users/pdiakumis/conda/envs/cwl-ica/lib/python3.8/utils/cli.py", line 556, in main
    _dispatch()
  File "/Users/pdiakumis/conda/envs/cwl-ica/lib/python3.8/utils/cli.py", line 390, in _dispatch
    tool_add_obj()
  File "/Users/pdiakumis/conda/envs/cwl-ica/lib/python3.8/subcommands/updaters/add_to_project.py", line 108, in __call__
    self.project.add_item_to_project(self.item_type_key, self.cwl_obj,
  File "/Users/pdiakumis/conda/envs/cwl-ica/lib/python3.8/classes/project_production.py", line 112, in add_item_to_project
    this_project_ica_item.create_workflow_id(access_token, self.project_id, linked_projects=self.linked_projects)
  File "/Users/pdiakumis/conda/envs/cwl-ica/lib/python3.8/classes/ica_workflow.py", line 137, in create_workflow_id
    raise ApiException
libica.openapi.libwes.exceptions.ApiException: (None)
Reason: None
alexiswl commented 1 year ago

Solution - find all missing workflows in development_workflows / production_workflows and add them to project.yaml

alexiswl commented 1 year ago

Development workflows

ica-context-switcher --scope admin --project-name development_workflows

on_ica="$( \
  ica workflows list \
    --max-items=0 \
    --output-format json | \
  jq --raw-output \
    '
      .items | 
      map(.id) | 
      sort | 
      .[]
    ' \
)"

in_config_yaml="$( \
  yq \
    '
      .projects | 
      map(
        select(
          .project_name=="development_workflows"
        )
      ) | 
      .[] | 
     (.workflows + .tools) | 
     map(
      .ica_workflow_id
     ) | 
     sort | 
     .[]
    ' < config/project.yaml \
)"

Use the comm command to see which ones are on ica are not in config yaml

comm -23 \
  <(echo $on_ica | tr ' ' '\n') \
  <(echo $in_config_yaml | tr ' ' '\n') | \
xargs -I{} \
  bash -c "ica workflows get {} -o json | jq -r '.name'"

Yields

bclconvert_dev-wf
custom-filter-vcf_dev-wf
bclconvert-with-qc-pipeline_dev-wf
dragen-somatic-with-germline-pipeline_dev-wf
map_resource_requirements_dev-wf
bclconvert-scatter_dev-wf
bcl-convert_dev-wf
validate-bclconvert-samplesheet_dev-wf

The following are still in PRs so it makes sense they're not in the config yaml on the main branch

bclconvert_dev-wf
bclconvert-with-qc-pipeline_dev-wf
dragen-somatic-with-germline-pipeline_dev-wf
bclconvert-scatter_dev-wf
bcl-convert_dev-wf
validate-bclconvert-samplesheet_dev-wf

map-resource_requirements_dev-wf is present on test_resource_mapping branch from commit c63ff6bf6f30c1b57b51e5791a0e9fccd5f0a27b.

Production workflows

So just custom-filter-vcf_dev-wf is not present for development workflows.

For production workflows, running the same commands (swapping out development_workflows for production workflows in the project name and context switcher commands), we get the following values for on_ica and in_config_yaml

# on_ica var
wfl.230846758ccf42e3831283ab0e45af0a
wfl.23f61cb1baab412a8c37dc93bed6c2af
wfl.438287b9982744c388aa9ef0136dc59a
wfl.576020a89adb49c3b2081a620d19104d
wfl.714e9172f3674023b210ccc7c47db05a
wfl.7e5ba7470b5549a6b4bf6d95daaa1214
wfl.7ed9c6014ac9498fbcbd4c17c28bc0d4
wfl.87e07ae6b46645a181e04813de535216
wfl.925c7ba0199a4ad6bb5341a3cffd191d
wfl.9e5bdd810ef8404fabb29f8d71214131
wfl.aa0ccece4e004839aa7374d1d6530633
wfl.b41302dd537d44fd92ae719a367d69fd
wfl.f257ca35ced94e648fdda1173144c476
# in config yaml var
wfl.230846758ccf42e3831283ab0e45af0a
wfl.23f61cb1baab412a8c37dc93bed6c2af
wfl.438287b9982744c388aa9ef0136dc59a
wfl.576020a89adb49c3b2081a620d19104d
wfl.714e9172f3674023b210ccc7c47db05a
wfl.7e5ba7470b5549a6b4bf6d95daaa1214
wfl.7ed9c6014ac9498fbcbd4c17c28bc0d4
wfl.87e07ae6b46645a181e04813de535216
wfl.925c7ba0199a4ad6bb5341a3cffd191d
wfl.9e5bdd810ef8404fabb29f8d71214131
wfl.aa0ccece4e004839aa7374d1d6530633
wfl.b41302dd537d44fd92ae719a367d69fd
wfl.f257ca35ced94e648fdda1173144c476

Which are exactly the same, kudos!

alexiswl commented 1 year ago

@skanwal

Running

ica workflows versions list wfl.a47748cad84d4ca0a41d39a1d40eeb8d

I get

ID                                      NAME    LANGUAGE        STATUS  TIMECREATED
wfv.ab72fd79ee79443b9647cc2db285d929    0.1.0   CWL             Draft   2022-10-05 10:45:14.072 +1100 AEDT

Then running

ica workflows versions get wfl.a47748cad84d4ca0a41d39a1d40eeb8d 0.1.0 \
 --output-format json | \
jq --raw-output \
  '
    .definition | 
    fromjson' | \
yq --prettyPrint --unwrapScalar

I get

class: CommandLineTool
id: '#main'
label: map_resource_requirements v(0.1.0)
doc: |
  Documentation for map_resource_requirements v0.1.0
hints:
  - dockerPull: bash:5
    class: DockerRequirement
  - coresMin: 2
    ramMin: 4000
    class: ResourceRequirement
    http://platform.illumina.com/rdf/ica/resources:
      type: $(calcInstanceType(inputs.input_size))
      size: small
requirements:
  - expressionLib:
      - |
        var calcMem = function calcMem(instanceType) {
          var mem_from_instance_type = {
              "small"  : "64m",
              "medium" : "256m",
              "large"  : "1g",
              }

          if (!(instanceType in mem_from_instance_type)) {
              return "default-mem"
              }

          return mem_from_instance_type[instanceType];
          }

        var calcInstanceType = function calcInstanceType(type) {
          var instance_type_from_mem = {
              "small"  : "Standard",
              "medium" : "StandardHiCpu",
              "large"   : "StandardHighIo"
              }

          if (!(type in instance_type_from_mem)) {
              return "Standard"
              }

          return instance_type_from_mem[type];
          }

        var calcCoresNumber = function calcCoresNumber(cores) {
          var cores_from_type = {
              "small"  : 2,
              "medium" : 3,
              "large"   : 4
              }

          if (!(cores in cores_from_type)) {
              return 2
              }

          return cores_from_type[cores];
          }
    class: InlineJavascriptRequirement
baseCommand:
  - echo
inputs:
  - label: input size
    doc: |
      The input size
    type:
      - "null"
      - string
    default: 128m
    inputBinding:
      prefix: --mem
      valueFrom: $(calcMem(self))
    id: '#input_size'
outputs:
  - label: output file
    doc: |
      The output file
    type: File
    id: '#output_file'
    outputBinding:
      glob: stdout.txt
stdout: stdout.txt
successCodes:
  - 0
https://schema.org/author:
  class: https://schema.org/Person
  https://schema.org/name: Sehrish Kanwal
  https://schema.org/email: sehrish.kanwal@umccr.org
cwlVersion: v1.1
$schemas:
  - https://schema.org/version/latest/schemaorg-current-http.rdf
$namespaces:
  s: https://schema.org/
  ilmn-tes: http://platform.illumina.com/rdf/ica/

Could this be with your map-resource_requirements_dev-wf workflow somewhere?

skanwal commented 1 year ago

@alexiswl - yes, it's my workflow that I had created for opening upissue with Ilumina. It's on a branch, perhaps I can open up a PR against main to resolve this issue?

alexiswl commented 1 year ago

No worries, can you link which branch it's on here and I'll close the issue

skanwal commented 1 year ago

@alexiswl - I have opened up PR for this tool https://github.com/umccr/cwl-ica/pull/164

alexiswl commented 1 year ago

Resolved by #164