rabix / bunny

[Legacy] Executor for CWL workflows. Executes sbg:draft-2 and CWL 1.0
http://rabix.io
Apache License 2.0
74 stars 28 forks source link

Failed to stage secondary files #442

Open Shenglai opened 6 years ago

Shenglai commented 6 years ago

Hi all,

I have a workflow which requires downloading input files from s3 by uuid, (e.g. example.gz and example.gz.tbi are downloaded separately by its own uuid) and in later steps, these pre-downloaded files should be staged as a "file, secondary file" structure.

My intention is to use InitialWorkDirRequirement to avoid unnecessary copying of the input files.

Here is my cwl:

#!/usr/bin/env cwl-runner

cwlVersion: v1.0

requirements:
  - class: DockerRequirement
    dockerPull: alpine
  - class: InlineJavascriptRequirement
  - class: InitialWorkDirRequirement
    listing: |
      ${
           var ret = [{"entryname": inputs.parent_file.basename, "entry": inputs.parent_file}];
           for( var i = 0; i < inputs.children.length; i++ ) {
               ret.push({"entryname": inputs.children[i].basename, "entry": inputs.children[i]});
           };
           return ret
       }

class: CommandLineTool

inputs:
  parent_file:
    type: File

  children:
    type: File[]

outputs:
  output:
    type: File
    outputBinding:
      glob: $(inputs.parent_file.basename)
    secondaryFiles: |
      ${
         var ret = [];
         var locbase = self.location.substr(0, self.location.lastIndexOf('/'))
         for( var i = 0; i < inputs.children.length; i++ ) {
           ret.push({"class": "File", "location": locbase + '/' + inputs.children[i].basename});
         }
         return ret
       }

baseCommand: "true"

The output from cwltool engine is:

[job make_secondary.cwl] completed success
{
    "output": {
        "checksum": "sha1$318c739ad52530f8913cc71c2ade57f75b5c4079", 
        "basename": "a", 
        "location": "file:///mnt/benchmark/tmp/a", 
        "secondaryFiles": [
            {
                "checksum": "sha1$49a9cd3ef8381da2b841001fc4f9bba9b9e1fbed", 
                "basename": "b", 
                "location": "file:///mnt/benchmark/tmp/b", 
                "path": "/mnt/benchmark/tmp/b", 
                "class": "File", 
                "size": 10
            }
        ], 
        "path": "/mnt/benchmark/tmp/a", 
        "class": "File", 
        "size": 10
    }
}
Final process status is success

However, from the latest rabix 1.0.5:

[2018-04-17 20:10:39.425] [INFO] Job root has started
[2018-04-17 20:10:39.594] [INFO] Pulling docker image alpine:latest
[2018-04-17 20:10:40.279] [INFO] Running command line: true
[2018-04-17 20:10:42.311] [ERROR] Failed to execute status command for root. Could not collect outputs.
org.rabix.executor.ExecutorException: Could not collect outputs.
        at org.rabix.executor.handler.impl.JobHandlerImpl.postprocess(JobHandlerImpl.java:318) ~[rabix-cli.jar:na]
        at org.rabix.executor.execution.command.StatusCommand.run(StatusCommand.java:52) ~[rabix-cli.jar:na]
        at org.rabix.executor.execution.JobHandlerCommand.run(JobHandlerCommand.java:51) [rabix-cli.jar:na]
        at org.rabix.executor.execution.JobHandlerRunnable.run(JobHandlerRunnable.java:58) [rabix-cli.jar:na]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_141]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_141]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_141]
Caused by: org.rabix.bindings.BindingException: org.rabix.bindings.cwl.service.CWLGlobException: Failed to extract outputs.
        at org.rabix.bindings.cwl.CWLProcessor.postprocess(CWLProcessor.java:150) ~[rabix-cli.jar:na]
        at org.rabix.bindings.cwl.CWLProcessor.postprocess(CWLProcessor.java:156) ~[rabix-cli.jar:na]
        at org.rabix.bindings.cwl.CWLBindings.postprocess(CWLBindings.java:89) ~[rabix-cli.jar:na]
        at org.rabix.executor.handler.impl.JobHandlerImpl.postprocess(JobHandlerImpl.java:290) ~[rabix-cli.jar:na]
        ... 6 common frames omitted
Caused by: org.rabix.bindings.cwl.service.CWLGlobException: Failed to extract outputs.
        at org.rabix.bindings.cwl.CWLProcessor.globFiles(CWLProcessor.java:388) ~[rabix-cli.jar:na]
        at org.rabix.bindings.cwl.CWLProcessor.collectOutput(CWLProcessor.java:312) ~[rabix-cli.jar:na]
        at org.rabix.bindings.cwl.CWLProcessor.collectOutputs(CWLProcessor.java:174) ~[rabix-cli.jar:na]
        at org.rabix.bindings.cwl.CWLProcessor.postprocess(CWLProcessor.java:146) ~[rabix-cli.jar:na]
        ... 9 common frames omitted
Caused by: java.lang.ClassCastException: java.util.HashMap cannot be cast to java.lang.String
        at org.rabix.bindings.cwl.CWLProcessor.getSecondaryFiles(CWLProcessor.java:459) ~[rabix-cli.jar:na]
        at org.rabix.bindings.cwl.CWLProcessor.formFileValue(CWLProcessor.java:400) ~[rabix-cli.jar:na]
        at org.rabix.bindings.cwl.CWLProcessor.globFiles(CWLProcessor.java:386) ~[rabix-cli.jar:na]
        ... 12 common frames omitted
[2018-04-17 20:10:42.311] [INFO] Failed to execute status command for root. Could not collect outputs.
Failed to execute status command for root. Could not collect outputs.

I'm just wondering if it's noticed already and if there's a workaround for my case. Thank you very much in advance.

kinow commented 2 years ago

The example above fails for me with the latest cwltool.

kinow@ranma:/tmp/bunny-1.0.6$ mkdir /tmp/cwl
kinow@ranma:/tmp/bunny-1.0.6$ touch /tmp/cwl/a
kinow@ranma:/tmp/bunny-1.0.6$ touch /tmp/cwl/b
(venv) kinow@ranma:~/Development/python/workspace/cwl-v1.2$ cwltool /tmp/make_secondary.cwl --parent_file /tmp/cwl/a --children /tmp/cwl/b
INFO /home/kinow/Development/python/workspace/cwl-v1.2/venv/bin/cwltool 3.1.20220502060230
INFO Resolved '/tmp/make_secondary.cwl' to 'file:///tmp/make_secondary.cwl'
INFO [job make_secondary.cwl] /tmp/fk6y5rac$ docker \
    run \
    -i \
    --mount=type=bind,source=/tmp/fk6y5rac,target=/jKWVxj \
    --mount=type=bind,source=/tmp/v6sssram,target=/tmp \
    --mount=type=bind,source=/tmp/cwl/a,target=/jKWVxj/a,readonly \
    --mount=type=bind,source=/tmp/cwl/b,target=/jKWVxj/b,readonly \
    --workdir=/jKWVxj \
    --read-only=true \
    --user=1000:1000 \
    --rm \
    --cidfile=/tmp/tg7_zeg5/20220613143313-613169.cid \
    --env=TMPDIR=/tmp \
    --env=HOME=/jKWVxj \
    alpine \
    true
INFO [job make_secondary.cwl] Max memory used: 0MiB
ERROR [job make_secondary.cwl] Job error:
("Error collecting output for parameter 'output': ../../../../../../tmp/make_secondary.cwl:33:5: 'path'", {})
WARNING [job make_secondary.cwl] completed permanentFail
{}
WARNING Final process status is permanentFail

It's something with the secondaryFiles expression for the output.