rabix / bunny

[Legacy] Executor for CWL workflows. Executes sbg:draft-2 and CWL 1.0
http://rabix.io
Apache License 2.0
74 stars 28 forks source link

Scatter with one job does not output an array #92

Closed bogdang989 closed 7 years ago

bogdang989 commented 7 years ago

If a node is scattered and one job is created, output is a single object instead of an array of objects.

chapmanb commented 7 years ago

I've been testing bcbio CWL runs with the latest bunny 1.0.0-rc2 and see similar problems. Anything that is meant for a scatter, including the initial input JSON gets passed as a single array of all objects. So if you have an input like:

{
    "description": [
        "Test1",
        "Test2"
    ]
}

instead of running two scattered jobs with description: Test1 and description: Test2 as inputs, you get one job with description: [Test1, Test2]. Thanks for all the work on getting bunny up to date with CWL 1.0, excited to have this running with bcbio generated CWL.

StarvingMarvin commented 7 years ago

@chapmanb It's probably unrelated issue. @stefanristeski did some work testing bcbio on bunny, and he identified some bunny issues, but also instances where bcbio script doesn't produce valid cwl1. For scatter specifically, there was an issue that bcbio declares scatter as step_id/input_id but for v1, it should only be input_id.

simonovic86 commented 7 years ago

@bogdang989 I've fixed it. This is the patch 069c1e8233ab16c2ac6cb76c2970afe4d05b1a75

Can you please verify if it's working. Thanks

bogdang989 commented 7 years ago

@simonovic86 It works! Thanks a lot

chapmanb commented 7 years ago

Luka -- thanks much for the tip, I hadn't realized the specification for this changed. It's so useful to have a separate implementation to help shake out these issues.

If I swap that over it does try to scatter but I immediately get an error:

org.rabix.engine.processor.handler.EventHandlerException: Port config__algorithm__align_split_size for root.alignment.1.prep_align_inputs and rootId 3f628f13-951b-41ed-8f06-0afc11ef1314 is not a list and therefore cannot be scattered.

The input is a list:

    "config__algorithm__align_split_size": [
        25000,
        25000
    ],

and I think is specified right:

inputs:
- id: config__algorithm__align_split_size
  type:
    items: long
    type: array
[...]
  scatter:
  - config__algorithm__align_split_size

Am I doing something obviously wrong here as well? I can also put together a repo with all of the test data to reproduce.

Thanks again for the help with this.

StarvingMarvin commented 7 years ago

Can you attach entire workflow so we could examine? Did you try specifying type as just long? As specified n you comment, I would expect that input should be a list of lists of longs, so when scattered, each invocation gets a list of longs.

StarvingMarvin commented 7 years ago

@chapmanb, @stefanristeski I've opened another github issue #94 to track bcbio support.