nextflow-io / nextflow-s3fs

An S3 File System Provider for Java 7 (project archived)
Apache License 2.0
1 stars 10 forks source link

Using apache ignite with non AWS S3 bucket as working directory #24

Open pbelmann opened 3 years ago

pbelmann commented 3 years ago

Hello,

my goal is use s3 for the working directory instead of a share file system. The ideal scenario would be for me that the remote files are stored on the worker nodes in a scratch directory where the process is executed and the result is then uploaded again to S3. For me it doesn't matter if the actual executor is 'slurm', 'ignite', etc. My first try was using apache ignite in cominbation with the -w parameter. However I'm using s3 API of ceph that is part of our openstack installation: https://docs.ceph.com/en/latest/radosgw/s3/. I created an example repository https://github.com/pbelmann/ignite-s3 that shows my approach.

Nextflow Version

      N E X T F L O W
      version 21.04.0 build 5552
      created 02-05-2021 16:22 UTC 
      cite doi:10.1038/nbt.3820
      http://nextflow.io

Nextflow Error reported

While the file is correctly staged in s3 by the master node, the worker node fails with the message:


Error executing process > 'runBBMapDeinterleave (test1)'

Caused by:
  java.io.IOException: No space left on device

Command executed:

  reformat.sh in=interleaved.fq.gz out1=read1.fq.gz out2=read2.fq.gz

Command exit status:
  -

Command output:
  (empty)

Work dir:
  s3://staging/staging/9d/38a8cf157159b7df900b867731c4ea

Looking at the node-nextflow.log the actual error is the following:

May-14 07:12:44.708 [pool-2-thread-1] DEBUG nextflow.file.FileHelper - Creating a file system instance for provider: S3FileSystemProvider
May-14 07:12:44.721 [pool-2-thread-1] DEBUG nextflow.file.FileHelper - AWS S3 config details: {}
May-14 07:12:47.444 [pool-2-thread-1] ERROR nextflow.executor.IgBaseTask - Cannot execute task > runBBMapDeinterleave (test2)
com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: *********************; S3 Extended Request ID: ********************)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4914)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4860)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4854)
        at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:880)
        at com.upplication.s3fs.AmazonS3Client.listObjects(AmazonS3Client.java:105)
        at com.upplication.s3fs.util.S3ObjectSummaryLookup.lookup(S3ObjectSummaryLookup.java:113)
        at com.upplication.s3fs.S3FileSystemProvider.readAttributes(S3FileSystemProvider.java:669)
        at java.base/java.nio.file.Files.readAttributes(Files.java:1764)
        at nextflow.util.CacheHelper.hashFile(CacheHelper.java:239)
        at nextflow.util.CacheHelper.hasher(CacheHelper.java:186)
        at nextflow.util.CacheHelper.hasher(CacheHelper.java:178)
        at nextflow.util.CacheHelper.hasher(CacheHelper.java:111)
        at nextflow.util.CacheHelper.hasher(CacheHelper.java:107)
        at nextflow.util.CacheHelper.hasher(CacheHelper.java:103)
        at nextflow.file.FileHelper.getLocalCachePath(FileHelper.groovy:645)
        at nextflow.executor.IgFileStagingStrategy.stage(IgFileStagingStrategy.groovy:81)
        at nextflow.executor.IgScriptStagingStrategy.super$2$stage(IgScriptStagingStrategy.groovy)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1268)
        at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:144)
        at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:164)
        at nextflow.executor.IgScriptStagingStrategy.stage(IgScriptStagingStrategy.groovy:55)
        at nextflow.executor.IgScriptTask.beforeExecute(IgScriptTask.groovy:56)
        at nextflow.executor.IgBaseTask.call(IgBaseTask.groovy:120)
        at nextflow.scheduler.SchedulerAgent$AgentProcessor.runTask0(SchedulerAgent.groovy:350)
        at nextflow.scheduler.SchedulerAgent$AgentProcessor$1.run(SchedulerAgent.groovy:339)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)

I believe that the reason for this error is the incompatibility between amazon S3 API and S3 API offered by ceph. Is there any way to get the actual S3 call that fails?