rabix / bunny

[Legacy] Executor for CWL workflows. Executes sbg:draft-2 and CWL 1.0
http://rabix.io
Apache License 2.0
74 stars 28 forks source link

S3 files not supported in 1.0.4 release #407

Closed adamstruck closed 6 years ago

adamstruck commented 6 years ago

Encountered this error when trying to test the latest release.

cwl:

cwlVersion: v1.0
class: CommandLineTool
doc: "Invoke 'samtools index' to create a 'BAI' index"

requirements:
  - class: DockerRequirement
    dockerPull: biocontainers/samtools:latest
  - class: InitialWorkDirRequirement
    listing: [ $(inputs.alignments) ]

inputs:
  alignments:
    type: File
    inputBinding:
      position: 1
      valueFrom: $(self.basename)
    label: Input bam file.

outputs:
  alignments_with_index:
    type: File
    secondaryFiles: .bai
    outputBinding:
      glob: $(inputs.alignments.basename)

baseCommand: ["samtools", "index"]

inputs:

{
  "alignments": {
    "location": "s3://funnel-test/NA12878.bam",
    "class": "File"
  }
}

logs:

[2018-01-10 14:41:10.047] [ERROR] Failed to use Bindings
org.rabix.bindings.BindingException: org.rabix.bindings.cwl.processor.CWLPortProcessorException: org.rabix.bindings.cwl.processor.CWLPortProcessorException: Error: Provider "s3" not installed while processing value: {metadata=null, format=null, dirname=null, nameroot=null, path=null, basename=null, size=null, nameext=null, contents=null, checksum=null, location=s3://funnel-test/NA12878.bam, secondaryFiles=[], class=File}
    at org.rabix.bindings.cwl.CWLProcessor.preprocess(CWLProcessor.java:108) ~[rabix-cli.jar:na]
    at org.rabix.bindings.cwl.CWLBindings.preprocess(CWLBindings.java:79) ~[rabix-cli.jar:na]
    at org.rabix.backend.tes.service.impl.LocalTESWorkerServiceImpl$TaskRunCallable.call(LocalTESWorkerServiceImpl.java:236) [rabix-cli.jar:na]
    at org.rabix.backend.tes.service.impl.LocalTESWorkerServiceImpl$TaskRunCallable.call(LocalTESWorkerServiceImpl.java:220) [rabix-cli.jar:na]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_152]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_152]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_152]
    at java.lang.Thread.run(Thread.java:748) [na:1.8.0_152]
Caused by: org.rabix.bindings.cwl.processor.CWLPortProcessorException: org.rabix.bindings.cwl.processor.CWLPortProcessorException: Error: Provider "s3" not installed while processing value: {metadata=null, format=null, dirname=null, nameroot=null, path=null, basename=null, size=null, nameext=null, contents=null, checksum=null, location=s3://funnel-test/NA12878.bam, secondaryFiles=[], class=File}
    at org.rabix.bindings.cwl.processor.callback.CWLPortProcessorHelper.setFileProperties(CWLPortProcessorHelper.java:70) ~[rabix-cli.jar:na]
    at org.rabix.bindings.cwl.CWLProcessor.preprocess(CWLProcessor.java:98) ~[rabix-cli.jar:na]
    ... 7 common frames omitted
Caused by: org.rabix.bindings.cwl.processor.CWLPortProcessorException: Error: Provider "s3" not installed while processing value: {metadata=null, format=null, dirname=null, nameroot=null, path=null, basename=null, size=null, nameext=null, contents=null, checksum=null, location=s3://funnel-test/NA12878.bam, secondaryFiles=[], class=File}
    at org.rabix.bindings.cwl.processor.CWLPortProcessor.processValues(CWLPortProcessor.java:56) ~[rabix-cli.jar:na]
    at org.rabix.bindings.cwl.processor.CWLPortProcessor.processInputs(CWLPortProcessor.java:31) ~[rabix-cli.jar:na]
    at org.rabix.bindings.cwl.processor.callback.CWLPortProcessorHelper.setFileProperties(CWLPortProcessorHelper.java:68) ~[rabix-cli.jar:na]
    ... 8 common frames omitted
Caused by: java.nio.file.FileSystemNotFoundException: Provider "s3" not installed
    at java.nio.file.Paths.get(Paths.java:147) ~[na:1.8.0_152]
    at org.rabix.bindings.cwl.helper.CWLFileValueHelper.buildMissingInfo(CWLFileValueHelper.java:380) ~[rabix-cli.jar:na]
    at org.rabix.bindings.cwl.processor.callback.CWLFilePropertiesProcessorCallback.process(CWLFilePropertiesProcessorCallback.java:28) ~[rabix-cli.jar:na]
    at org.rabix.bindings.cwl.processor.CWLPortProcessor.processValue(CWLPortProcessor.java:70) ~[rabix-cli.jar:na]
    at org.rabix.bindings.cwl.processor.CWLPortProcessor.processValues(CWLPortProcessor.java:54) ~[rabix-cli.jar:na]
    ... 10 common frames omitted
[2018-01-10 14:41:10.235] [ERROR] Failed to retrieve TESTask
java.util.concurrent.ExecutionException: org.rabix.backend.tes.service.TESServiceException: Failed to use Bindings
    at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.8.0_152]
    at java.util.concurrent.FutureTask.get(FutureTask.java:192) [na:1.8.0_152]
    at org.rabix.backend.tes.service.impl.LocalTESWorkerServiceImpl$1.run(LocalTESWorkerServiceImpl.java:169) ~[rabix-cli.jar:na]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_152]
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_152]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_152]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_152]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_152]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_152]
    at java.lang.Thread.run(Thread.java:748) [na:1.8.0_152]
Caused by: org.rabix.backend.tes.service.TESServiceException: Failed to use Bindings
    at org.rabix.backend.tes.service.impl.LocalTESWorkerServiceImpl$TaskRunCallable.call(LocalTESWorkerServiceImpl.java:302) ~[rabix-cli.jar:na]
    at org.rabix.backend.tes.service.impl.LocalTESWorkerServiceImpl$TaskRunCallable.call(LocalTESWorkerServiceImpl.java:220) ~[rabix-cli.jar:na]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_152]
    ... 3 common frames omitted
Caused by: org.rabix.bindings.BindingException: org.rabix.bindings.cwl.processor.CWLPortProcessorException: org.rabix.bindings.cwl.processor.CWLPortProcessorException: Error: Provider "s3" not installed while processing value: {metadata=null, format=null, dirname=null, nameroot=null, path=null, basename=null, size=null, nameext=null, contents=null, checksum=null, location=s3://funnel-test/NA12878.bam, secondaryFiles=[], class=File}
    at org.rabix.bindings.cwl.CWLProcessor.preprocess(CWLProcessor.java:108) ~[rabix-cli.jar:na]
    at org.rabix.bindings.cwl.CWLBindings.preprocess(CWLBindings.java:79) ~[rabix-cli.jar:na]
    at org.rabix.backend.tes.service.impl.LocalTESWorkerServiceImpl$TaskRunCallable.call(LocalTESWorkerServiceImpl.java:236) ~[rabix-cli.jar:na]
    ... 5 common frames omitted
Caused by: org.rabix.bindings.cwl.processor.CWLPortProcessorException: org.rabix.bindings.cwl.processor.CWLPortProcessorException: Error: Provider "s3" not installed while processing value: {metadata=null, format=null, dirname=null, nameroot=null, path=null, basename=null, size=null, nameext=null, contents=null, checksum=null, location=s3://funnel-test/NA12878.bam, secondaryFiles=[], class=File}
    at org.rabix.bindings.cwl.processor.callback.CWLPortProcessorHelper.setFileProperties(CWLPortProcessorHelper.java:70) ~[rabix-cli.jar:na]
    at org.rabix.bindings.cwl.CWLProcessor.preprocess(CWLProcessor.java:98) ~[rabix-cli.jar:na]
    ... 7 common frames omitted
Caused by: org.rabix.bindings.cwl.processor.CWLPortProcessorException: Error: Provider "s3" not installed while processing value: {metadata=null, format=null, dirname=null, nameroot=null, path=null, basename=null, size=null, nameext=null, contents=null, checksum=null, location=s3://funnel-test/NA12878.bam, secondaryFiles=[], class=File}
    at org.rabix.bindings.cwl.processor.CWLPortProcessor.processValues(CWLPortProcessor.java:56) ~[rabix-cli.jar:na]
    at org.rabix.bindings.cwl.processor.CWLPortProcessor.processInputs(CWLPortProcessor.java:31) ~[rabix-cli.jar:na]
    at org.rabix.bindings.cwl.processor.callback.CWLPortProcessorHelper.setFileProperties(CWLPortProcessorHelper.java:68) ~[rabix-cli.jar:na]
    ... 8 common frames omitted
Caused by: java.nio.file.FileSystemNotFoundException: Provider "s3" not installed
    at java.nio.file.Paths.get(Paths.java:147) ~[na:1.8.0_152]
    at org.rabix.bindings.cwl.helper.CWLFileValueHelper.buildMissingInfo(CWLFileValueHelper.java:380) ~[rabix-cli.jar:na]
    at org.rabix.bindings.cwl.processor.callback.CWLFilePropertiesProcessorCallback.process(CWLFilePropertiesProcessorCallback.java:28) ~[rabix-cli.jar:na]
    at org.rabix.bindings.cwl.processor.CWLPortProcessor.processValue(CWLPortProcessor.java:70) ~[rabix-cli.jar:na]
    at org.rabix.bindings.cwl.processor.CWLPortProcessor.processValues(CWLPortProcessor.java:54) ~[rabix-cli.jar:na]
    ... 10 common frames omitted
adamstruck commented 6 years ago

I encounter a similar problem if I compile the master branch and rerun:

[2018-01-10 14:25:28.376] [DEBUG] Config path: /Users/strucka/Projects/bunny/rabix-cli/config
[2018-01-10 14:25:28.382] [DEBUG] Configuration directory found localy.
[2018-01-10 14:25:28.451] [DEBUG] Base path set to file:///Users/strucka/Projects/bunny/rabix-cli/config/core.properties
[2018-01-10 14:25:28.675] [DEBUG] Base path set to file:///Users/strucka/Projects/bunny/rabix-cli/config/core.properties
[2018-01-10 14:25:29.119] [DEBUG] Base path set to file:///Users/strucka/Projects/bunny/rabix-cli/config/core.properties
[2018-01-10 14:25:29.123] [DEBUG] Base path set to file:///Users/strucka/Projects/bunny/rabix-cli/config/core.properties
[2018-01-10 14:25:29.128] [DEBUG] Base path set to file:///Users/strucka/Projects/bunny/rabix-cli/config/core.properties
[2018-01-10 14:25:29.431] [DEBUG] Base path set to file:///Users/strucka/Projects/bunny/rabix-cli/config/core.properties
[2018-01-10 14:25:30.040] [DEBUG] Initialized ContextRecordCache with size 1000
[2018-01-10 14:25:30.040] [DEBUG] DAGCache initialized with size=16
[2018-01-10 14:25:30.041] [DEBUG] JobStatsCache initialized with size=2000
[2018-01-10 14:25:30.042] [DEBUG] ApplicationCache initialized with size=16
[2018-01-10 14:25:30.445] [DEBUG] Internal logging successfully configured to commons logger: true
[2018-01-10 14:25:30.492] [DEBUG] Admin mbean registered under com.amazonaws.management:type=AwsSdkMetrics
[2018-01-10 14:25:33.972] [ERROR] Encountered an error while starting local backend.
org.rabix.engine.service.BootstrapServiceException: com.google.inject.ProvisionException: Unable to provision, see the following errors:

1) Error injecting constructor, java.nio.file.FileSystemNotFoundException: S3 filesystem not yet created. Use newFileSystem() instead
  at org.rabix.backend.tes.service.impl.LocalTESStorageServiceImpl.<init>(LocalTESStorageServiceImpl.java:39)
  while locating org.rabix.backend.tes.service.impl.LocalTESStorageServiceImpl
  at org.rabix.backend.tes.TESModule.configure(TESModule.java:23) (via modules: org.rabix.cli.BackendCommandLine$1 -> org.rabix.backend.tes.TESModule)
  while locating org.rabix.backend.tes.service.TESStorageService
    for field at org.rabix.backend.tes.service.impl.LocalTESWorkerServiceImpl.storage(LocalTESWorkerServiceImpl.java:79)
  while locating org.rabix.backend.tes.service.impl.LocalTESWorkerServiceImpl

1 error
    at org.rabix.engine.service.impl.BootstrapServiceImpl.start(BootstrapServiceImpl.java:47) ~[classes/:na]
    at org.rabix.cli.BackendCommandLine.main(BackendCommandLine.java:360) ~[classes/:na]
Caused by: com.google.inject.ProvisionException: Unable to provision, see the following errors:

1) Error injecting constructor, java.nio.file.FileSystemNotFoundException: S3 filesystem not yet created. Use newFileSystem() instead
  at org.rabix.backend.tes.service.impl.LocalTESStorageServiceImpl.<init>(LocalTESStorageServiceImpl.java:39)
  while locating org.rabix.backend.tes.service.impl.LocalTESStorageServiceImpl
  at org.rabix.backend.tes.TESModule.configure(TESModule.java:23) (via modules: org.rabix.cli.BackendCommandLine$1 -> org.rabix.backend.tes.TESModule)
  while locating org.rabix.backend.tes.service.TESStorageService
    for field at org.rabix.backend.tes.service.impl.LocalTESWorkerServiceImpl.storage(LocalTESWorkerServiceImpl.java:79)
  while locating org.rabix.backend.tes.service.impl.LocalTESWorkerServiceImpl

1 error
    at com.google.inject.internal.Errors.throwProvisionExceptionIfErrorsExist(Errors.java:486) ~[guice-4.1.0.jar:na]
    at com.google.inject.internal.MembersInjectorImpl.injectMembers(MembersInjectorImpl.java:67) ~[guice-4.1.0.jar:na]
    at com.google.inject.internal.InjectorImpl.injectMembers(InjectorImpl.java:987) ~[guice-4.1.0.jar:na]
Disconnected from the target VM, address: '127.0.0.1:62023', transport: 'socket'
    at org.rabix.engine.service.impl.BackendServiceImpl.scanEmbedded(BackendServiceImpl.java:82) ~[classes/:na]
    at org.rabix.engine.service.impl.BootstrapServiceImpl.start(BootstrapServiceImpl.java:45) ~[classes/:na]
    ... 1 common frames omitted
Caused by: java.nio.file.FileSystemNotFoundException: S3 filesystem not yet created. Use newFileSystem() instead
    at com.upplication.s3fs.S3FileSystemProvider.getFileSystem(S3FileSystemProvider.java:278) ~[s3fs-2.2.1.jar:na]
    at com.upplication.s3fs.S3FileSystemProvider.getPath(S3FileSystemProvider.java:296) ~[s3fs-2.2.1.jar:na]
    at java.nio.file.Paths.get(Paths.java:143) ~[na:1.8.0_152]
    at org.rabix.backend.tes.service.impl.LocalTESStorageServiceImpl.<init>(LocalTESStorageServiceImpl.java:61) ~[classes/:na]
    at org.rabix.backend.tes.service.impl.LocalTESStorageServiceImpl$$FastClassByGuice$$148bf36a.newInstance(<generated>) ~[classes/:na]
    at com.google.inject.internal.DefaultConstructionProxyFactory$FastClassProxy.newInstance(DefaultConstructionProxyFactory.java:89) ~[guice-4.1.0.jar:na]
    at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:111) ~[guice-4.1.0.jar:na]
    at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:90) ~[guice-4.1.0.jar:na]
    at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:268) ~[guice-4.1.0.jar:na]
    at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:56) ~[guice-4.1.0.jar:na]
    at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46) ~[guice-4.1.0.jar:na]
    at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092) ~[guice-4.1.0.jar:na]
    at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40) ~[guice-4.1.0.jar:na]
    at com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:194) ~[guice-4.1.0.jar:na]
    at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41) ~[guice-4.1.0.jar:na]
    at com.google.inject.internal.SingleFieldInjector.inject(SingleFieldInjector.java:54) ~[guice-4.1.0.jar:na]
    at com.google.inject.internal.MembersInjectorImpl.injectMembers(MembersInjectorImpl.java:132) ~[guice-4.1.0.jar:na]
    at com.google.inject.internal.MembersInjectorImpl$1.call(MembersInjectorImpl.java:93) ~[guice-4.1.0.jar:na]
    at com.google.inject.internal.MembersInjectorImpl$1.call(MembersInjectorImpl.java:80) ~[guice-4.1.0.jar:na]
    at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1085) ~[guice-4.1.0.jar:na]
    at com.google.inject.internal.MembersInjectorImpl.injectAndNotify(MembersInjectorImpl.java:80) ~[guice-4.1.0.jar:na]
    at com.google.inject.internal.MembersInjectorImpl.injectMembers(MembersInjectorImpl.java:62) ~[guice-4.1.0.jar:na]
    ... 4 common frames omitted
milos-ljubinkovic commented 6 years ago

Yeah, sorry about that, was way too focused on staging files that I completely forgot to test the case when they are already on s3.

Moved the remote filesystem init sooner in the program flow on this branch: https://github.com/rabix/bunny/tree/bugfix/s3-files

Also note that the s3 lib supports either

s3://[endpoint]/[bucket]/[path]

or

s3:///[bucket]/[path] with assuming the endpoint is s3.amazonaws.com

Just s3://[bucket]/[path] isn't supported for now due to the way URIs in java work, but I guess I could add the endpoint as the default host to the URIs in the future or something.

adamstruck commented 6 years ago

I am still getting the same errors using https://github.com/rabix/bunny/tree/bugfix/s3-files

milos-ljubinkovic commented 6 years ago

What is your rabix.tes.storage.base url?

adamstruck commented 6 years ago
rabix.tes.storage.base=s3://funnel-test/rabix
milos-ljubinkovic commented 6 years ago

Try it with s3://[endpoint]/[bucket]/[path]

or

s3:///[bucket]/[path]

Just s3://[bucket]/[path] isn't supported for now due to the way URIs in java work, but I guess I could add the endpoint as the default host to the URIs in the future or something.

adamstruck commented 6 years ago

Neither of those alternatives did the trick. Still errors at this line with the same error: https://github.com/rabix/bunny/blob/bugfix/s3-files/rabix-cli/src/main/java/org/rabix/cli/BackendCommandLine.java#L388

milos-ljubinkovic commented 6 years ago

It works on my side as long as the input urls have the same endpoint as the storage base, maybe that is the issue?

input:

{
   "input_file": {
      "class" : "File",
      "location": "s3://s3.us-east-2.amazonaws.com/testbunny/input.txt",
      "path":"data/input.txt"
     }
}

config: rabix.tes.storage.base=s3://s3.us-east-2.amazonaws.com/testbunny/

output:

{
  "output_protein" : {
    "basename" : "protein.txt",
    "checksum" : "sha1$55adf0ec2ecc6aee57a774d48216ac5a97d6e5ba",
    "class" : "File",
    "contents" : null,
    "dirname" : "s3://s3.us-east-2.amazonaws.com/testbunny/c27041cd-d0ec-45c4-8859-8b706e15fd88/root/Translate/",
    "format" : null,
    "location" : "s3://s3.us-east-2.amazonaws.com/testbunny/c27041cd-d0ec-45c4-8859-8b706e15fd88/root/Translate/protein.txt",
    "metadata" : null,
    "nameext" : ".txt",
    "nameroot" : "protein",
    "path" : "/testbunny/c27041cd-d0ec-45c4-8859-8b706e15fd88/root/Translate/protein.txt",
    "secondaryFiles" : [ ],
    "size" : 9
  }
}

Tested with examples/dna2protein

It also works with s3://s3.amazonaws.com/testbunny/ and s3:///testbunny/ on both config and inputs but exceptions happen when you mix and match. I'll see about making the s3 lib endpoint agnostic in the future.

adamstruck commented 6 years ago

I was unable to get your example above to work. I am not sure what the problem is. I traced the execution in IntelliJ and confirmed that your new code is being called, but I am still getting the Error injecting constructor, java.nio.file.FileSystemNotFoundException: S3 filesystem not yet created. Use newFileSystem() instead error.

milos-ljubinkovic commented 6 years ago

Think I got it, my local maven repo had a different s3 lib version under the the same version number so it behaved differently. I've updated the s3-files branch with this version of the lib so it should work.

adamstruck commented 6 years ago

That did the trick.