rabix / bunny

[Legacy] Executor for CWL workflows. Executes sbg:draft-2 and CWL 1.0
http://rabix.io
Apache License 2.0
74 stars 28 forks source link

TES backend doesn't seem to support the CWL allowed exit codes #408

Open kmhernan opened 6 years ago

kmhernan commented 6 years ago

When using bunny + TES a task that has a non-zero exit code, but is considered acceptable by the CWL spec still is interpreted as failed by the rabix engine. I have tested the same workflow with the the local execution backend and it runs as expected. While the TES may return the error state, the exit code is stored in the TES TaskLog message and should be used in this case to override the error state provided by the TES backend.

milos-ljubinkovic commented 6 years ago

Are the output files uploaded when the TES task errors-out?

buchanae commented 6 years ago

Are the output files uploaded when the TES task errors-out?

Ah, I hadn't thought about this.

Currently, Funnel will stop processing on the first failed executor, and will not upload output files. We've discussed changing this behavior, in order to provide a sort of "best effort" behavior, where Funnel tries to get you all the data it has. In other words, we could try make Funnel upload any outputs it can find. There are some details to iron out there though. Currently it's an error if an output isn't found, which wouldn't be true in this situation.

milos-ljubinkovic commented 6 years ago

Bunny could wrap those tools with defined successCodes into a command that always exits with a 0 but stores the actual exit code somewhere and then evaluates the success state in the postprocess stage. This makes sense as it's a cwl feature so it should work independently of funnel's support for it.

kmhernan commented 6 years ago

Yeah I was using shared FS in these tests when I noticed it but no the output wasn’t copied/linked over to the bunny directory structure from what I can see.

milos-ljubinkovic commented 6 years ago

I've made some quick changes on this branch: https://github.com/rabix/bunny/tree/tes/exitcodes

If there are allowed exit codes in the app the exit code is saved and overridden to 0 inside TES and then independently evaluated after execution.

Changed the way command line is built to accommodate this so some side effects with weird command lines might happen.

kmhernan commented 6 years ago

Awesome @milos-ljubinkovic ... my quick peek at the source suggests that this branch also supports the newer TES spec correct? Since I'm testing with the newer funnel versions that have the newer TES spec, I have had to edit the source from older rabix versions... just double checking so I can test it with my workflow.

milos-ljubinkovic commented 6 years ago

It should support the latest TES spec and was tested against funnel's master branch on 10th January I think. Some issues with s3 and endpoints were reported, though.

kmhernan commented 6 years ago

great... yeah we gave up on s3 for now and testing with ceph FS... will test this today thanks

adamstruck commented 6 years ago

We are tracking this issue in Funnel https://github.com/ohsu-comp-bio/funnel/issues/425

kmhernan commented 6 years ago

@milos-ljubinkovic it seems like I can't get around this exception with this branch:

java.lang.IllegalArgumentException: Illegal character in scheme name at index 0: {
  "appFileLocation" : "/mnt/cephfs/cwls/jeremiah/gdc-dnaseq-cwl/workflows/dnaseq/metrics.cwl",
  "successCodes" : [ ],
  "cwlVersion" : "v1.0",
  "inputs" : [ {
    "id" : "bam",
    "type" : "File",
    "scatter" : true
  }, {
    "id" : "known_snp",
    "type" : "File"
  }, {

It happens on both local and TES backends.

milos-ljubinkovic commented 6 years ago

Made a quick revert on that branch that had something to do with ignoring IllegalArgumentExceptions, so it might help but didn't really reproduce the issue. Could you upload your workflow or the full stack trace?

kmhernan commented 6 years ago

@milos-ljubinkovic I think that's where the issue is, here more of the stack trace I can easily grep out... I'm running again with your changes right now.

java.lang.IllegalArgumentException: Illegal character in scheme name at index 0: {
    at java.net.URI.create(URI.java:852) ~[na:1.8.0_141]
    at org.rabix.bindings.cwl.resolver.CWLDocumentResolver.resolve(CWLDocumentResolver.java:100) ~[rabix-cli.jar:na]
    at org.rabix.bindings.cwl.helper.CWLJobHelper.getCWLJob(CWLJobHelper.java:20) ~[rabix-cli.jar:na]
    at org.rabix.bindings.cwl.CWLProcessor.transformInputs(CWLProcessor.java:519) ~[rabix-cli.jar:na]
    at org.rabix.bindings.cwl.CWLBindings.transformInputs(CWLBindings.java:175) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.handler.impl.JobStatusEventHandler.handleTransform(JobStatusEventHandler.java:356) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.handler.impl.JobStatusEventHandler.ready(JobStatusEventHandler.java:289) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.handler.impl.JobStatusEventHandler.handle(JobStatusEventHandler.java:109) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.handler.impl.JobStatusEventHandler.handle(JobStatusEventHandler.java:43) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.impl.EventProcessorImpl.send(EventProcessorImpl.java:210) [rabix-cli.jar:na]
    at org.rabix.engine.processor.impl.MultiEventProcessorImpl.send(MultiEventProcessorImpl.java:59) ~[rabix-cli.jar:na]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_141]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_141]
    at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:50) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.handler.impl.InputEventHandler.handle(InputEventHandler.java:99) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.handler.impl.InputEventHandler.handle(InputEventHandler.java:27) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.impl.EventProcessorImpl.send(EventProcessorImpl.java:210) [rabix-cli.jar:na]
    at org.rabix.engine.processor.impl.MultiEventProcessorImpl.send(MultiEventProcessorImpl.java:59) ~[rabix-cli.jar:na]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_141]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_141]
    at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:50) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.handler.impl.ScatterHandler.createScatteredJobs(ScatterHandler.java:222) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.handler.impl.ScatterHandler.scatterPort(ScatterHandler.java:115) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.handler.impl.JobStatusEventHandler.ready(JobStatusEventHandler.java:277) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.handler.impl.JobStatusEventHandler.handle(JobStatusEventHandler.java:109) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.handler.impl.JobStatusEventHandler.handle(JobStatusEventHandler.java:43) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.impl.EventProcessorImpl.send(EventProcessorImpl.java:210) [rabix-cli.jar:na]
    at org.rabix.engine.processor.impl.MultiEventProcessorImpl.send(MultiEventProcessorImpl.java:59) ~[rabix-cli.jar:na]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_141]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_141]
    at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:50) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.handler.impl.InputEventHandler.handle(InputEventHandler.java:99) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.handler.impl.InputEventHandler.handle(InputEventHandler.java:27) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.impl.EventProcessorImpl.send(EventProcessorImpl.java:210) [rabix-cli.jar:na]
    at org.rabix.engine.processor.impl.MultiEventProcessorImpl.send(MultiEventProcessorImpl.java:59) ~[rabix-cli.jar:na]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_141]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_141]
    at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:50) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.handler.impl.OutputEventHandler.handle(OutputEventHandler.java:112) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.handler.impl.OutputEventHandler.handle(OutputEventHandler.java:34) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.impl.EventProcessorImpl.send(EventProcessorImpl.java:210) [rabix-cli.jar:na]
    at org.rabix.engine.processor.impl.MultiEventProcessorImpl.send(MultiEventProcessorImpl.java:59) ~[rabix-cli.jar:na]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_141]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_141]
    at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:50) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.handler.impl.OutputEventHandler.handle(OutputEventHandler.java:112) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.handler.impl.OutputEventHandler.handle(OutputEventHandler.java:34) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.impl.EventProcessorImpl.send(EventProcessorImpl.java:210) [rabix-cli.jar:na]
    at org.rabix.engine.processor.impl.MultiEventProcessorImpl.send(MultiEventProcessorImpl.java:59) ~[rabix-cli.jar:na]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_141]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_141]
    at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:50) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.handler.impl.JobStatusEventHandler.handle(JobStatusEventHandler.java:160) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.handler.impl.JobStatusEventHandler.handle(JobStatusEventHandler.java:43) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.impl.EventProcessorImpl.handle(EventProcessorImpl.java:175) [rabix-cli.jar:na]
    at org.rabix.engine.processor.impl.EventProcessorImpl.lambda$doProcessEvent$3(EventProcessorImpl.java:108) [rabix-cli.jar:na]
    at org.rabix.engine.store.memory.InMemoryRepositoryRegistry.doInTransaction(InMemoryRepositoryRegistry.java:92) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.impl.EventProcessorImpl.doProcessEvent(EventProcessorImpl.java:107) [rabix-cli.jar:na]
    at org.rabix.engine.processor.impl.EventProcessorImpl.lambda$null$1(EventProcessorImpl.java:91) [rabix-cli.jar:na]
    at org.rabix.engine.metrics.impl.MetricsHelperImpl.time(MetricsHelperImpl.java:78) ~[rabix-cli.jar:na]
    at org.rabix.engine.processor.impl.EventProcessorImpl.lambda$start$2(EventProcessorImpl.java:91) [rabix-cli.jar:na]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_141]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_141]
    at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_141]
Caused by: java.net.URISyntaxException: Illegal character in scheme name at index 0: {
    at java.net.URI$Parser.fail(URI.java:2848) ~[na:1.8.0_141]
    at java.net.URI$Parser.checkChars(URI.java:3021) ~[na:1.8.0_141]
    at java.net.URI$Parser.checkChar(URI.java:3031) ~[na:1.8.0_141]
    at java.net.URI$Parser.parse(URI.java:3047) ~[na:1.8.0_141]
    at java.net.URI.<init>(URI.java:588) ~[na:1.8.0_141]
    at java.net.URI.create(URI.java:850) ~[na:1.8.0_141]