nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.73k stars 625 forks source link

Nextflow support for Openstack Swift S3 API middleware #797

Closed bosterholz closed 3 years ago

bosterholz commented 6 years ago

Bug report

Expected behavior and actual behavior

Nextflow should be able to access the Swift Object Store through the S3 API middleware, which emulates the S3 REST API.

This does not seem to function. Every attempt produces a gateway time-out error.

Program output

error: Gateway Time-out (Service: Amazon S3; Status Code: 504; Error Code: 504 Gateway Time-out; Request ID: null; S3 Extended Request ID: null)

.nextflow.log:

Jul-11 15:29:05.942 [main] DEBUG nextflow.cli.Launcher - Setting HTTP proxy: [proxy.cebitec.uni-bielefeld.de, 3128]
Jul-11 15:29:05.974 [main] DEBUG nextflow.cli.Launcher - Setting HTTPS proxy: [proxy.cebitec.uni-bielefeld.de, 3128]
Jul-11 15:29:05.974 [main] DEBUG nextflow.cli.Launcher - Setting http proxy: [proxy.cebitec.uni-bielefeld.de, 3128]
Jul-11 15:29:05.974 [main] DEBUG nextflow.cli.Launcher - Setting https proxy: [proxy.cebitec.uni-bielefeld.de, 3128]
Jul-11 15:29:05.974 [main] DEBUG nextflow.cli.Launcher - $> ./nextflow s3.nf
Jul-11 15:29:06.021 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 0.30.2
Jul-11 15:29:06.028 [main] INFO  nextflow.cli.CmdRun - Launching `s3.nf` [reverent_hawking] - revision: 5575e7b6f4
Jul-11 15:29:06.038 [main] DEBUG nextflow.config.ConfigBuilder - Found config local: /home/ubuntu/nextflow/nextflow.config
Jul-11 15:29:06.038 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /home/ubuntu/nextflow/nextflow.config
Jul-11 15:29:06.056 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard`
Jul-11 15:29:06.276 [main] DEBUG nextflow.Session - Session uuid: 88cfce2e-a933-41c6-a2eb-e2b5281a2f5a
Jul-11 15:29:06.276 [main] DEBUG nextflow.Session - Run name: reverent_hawking
Jul-11 15:29:06.277 [main] DEBUG nextflow.Session - Executor pool size: 16
Jul-11 15:29:06.286 [main] DEBUG nextflow.cli.CmdRun -
  Version: 0.30.2 build 4867
  Modified: 16-06-2018 17:49 UTC
  System: Linux 4.4.0-109-generic
  Runtime: Groovy 2.4.15 on OpenJDK 64-Bit Server VM 1.8.0_171-8u171-b11-0ubuntu0.16.04.1-b11
  Encoding: UTF-8 (UTF-8)
  Process: 5624@nextflows3 [192.168.0.11]
  CPUs: 16 - Mem: 31.4 GB (26.8 GB) - Swap: 1,024 MB (1,024 MB)
Jul-11 15:29:06.302 [main] DEBUG nextflow.Session - Work-dir: /home/ubuntu/nextflow/work [ext2/ext3]
Jul-11 15:29:06.302 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /home/ubuntu/nextflow/bin
Jul-11 15:29:06.419 [main] DEBUG nextflow.Session - Session start invoked
Jul-11 15:29:06.422 [main] DEBUG nextflow.processor.TaskDispatcher - Dispatcher > start
Jul-11 15:29:06.422 [main] DEBUG nextflow.script.ScriptRunner - > Script parsing
Jul-11 15:29:06.517 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Jul-11 15:29:06.534 [main] DEBUG nextflow.file.FileHelper - Creating a file system instance for provider: S3FileSystemProvider
Jul-11 15:29:06.537 [main] DEBUG nextflow.Global - Using AWS credentials defined in nextflow config file
Jul-11 15:29:06.539 [main] DEBUG nextflow.file.FileHelper - AWS S3 config details: {secret_key=0c7643.., protocol=HTTPS, endpoint=openstack.cebitec.uni-bielefeld.de:8080, access_key=f12c87..}
Jul-11 15:29:06.929 [main] DEBUG nextflow.processor.ProcessFactory - << taskConfig executor: null
Jul-11 15:29:06.929 [main] DEBUG nextflow.processor.ProcessFactory - >> processorType: 'local'
Jul-11 15:29:06.932 [main] DEBUG nextflow.executor.Executor - Initializing executor: local
Jul-11 15:29:06.934 [main] INFO  nextflow.executor.Executor - [warm up] executor > local
Jul-11 15:29:06.938 [main] DEBUG n.processor.LocalPollingMonitor - Creating local task monitor for executor 'local' > cpus=16; memory=31.4 GB; capacity=16; pollInterval=100ms; dumpInterval=5m
Jul-11 15:29:06.941 [main] DEBUG nextflow.processor.TaskDispatcher - Starting monitor: LocalPollingMonitor
Jul-11 15:29:06.942 [main] DEBUG n.processor.TaskPollingMonitor - >>> barrier register (monitor: local)
Jul-11 15:29:06.944 [main] DEBUG nextflow.executor.Executor - Invoke register for executor: local
Jul-11 15:29:06.984 [main] DEBUG nextflow.Session - >>> barrier register (process: foo)
Jul-11 15:29:06.986 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > foo -- maxForks: 16
Jul-11 15:29:07.006 [main] DEBUG nextflow.script.ScriptRunner - > Await termination
Jul-11 15:29:07.006 [main] DEBUG nextflow.Session - Session await
Jul-11 15:29:07.558 [Actor Thread 3] ERROR nextflow.processor.TaskProcessor - Error executing process > 'foo'

Caused by:
  Gateway Time-out (Service: Amazon S3; Status Code: 504; Error Code: 504 Gateway Time-out; Request ID: null; S3 Extended Request ID: null)

com.amazonaws.services.s3.model.AmazonS3Exception: Gateway Time-out (Service: Amazon S3; Status Code: 504; Error Code: 504 Gateway Time-out; Request ID: null; S3 Extended Request ID: null)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1632)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1304)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1058)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4365)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4312)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4306)
        at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:840)
        at com.upplication.s3fs.AmazonS3Client.listObjects(AmazonS3Client.java:104)
        at com.upplication.s3fs.util.S3ObjectSummaryLookup.lookup(S3ObjectSummaryLookup.java:117)
 at com.upplication.s3fs.S3FileSystemProvider.readAttributes(S3FileSystemProvider.java:643)
        at java.nio.file.Files.readAttributes(Files.java:1737)
        at nextflow.util.CacheHelper.hashFile(CacheHelper.java:183)
        at nextflow.util.CacheHelper.hasher(CacheHelper.java:142)
        at nextflow.util.CacheHelper.hasher(CacheHelper.java:139)
        at nextflow.util.CacheHelper.hasher(CacheHelper.java:74)
        at nextflow.util.CacheHelper.hasher(CacheHelper.java:70)
        at nextflow.util.CacheHelper.hashUnorderedCollection(CacheHelper.java:256)
        at nextflow.util.CacheHelper.hasher(CacheHelper.java:130)
        at nextflow.util.CacheHelper.hasher(CacheHelper.java:134)
        at nextflow.util.CacheHelper.hasher(CacheHelper.java:74)
        at nextflow.util.CacheHelper.hasher(CacheHelper.java:70)
        at nextflow.util.CacheHelper$hasher.call(Unknown Source)
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:136)
        at nextflow.processor.TaskProcessor.createTaskHashKey(TaskProcessor.groovy:1976)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:210)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:59)
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:157)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:169)
        at nextflow.processor.TaskProcessor.invokeTask(TaskProcessor.groovy:676)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSite.invoke(PogoMetaMethodSite.java:169)
        at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.call(PogoMetaMethodSite.java:71)
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:128)
        at nextflow.processor.TaskProcessor$InvokeTaskAdapter.call(TaskProcessor.groovy:224)
        at groovyx.gpars.dataflow.operator.DataflowOperatorActor.startTask(DataflowOperatorActor.java:120)
        at groovyx.gpars.dataflow.operator.ForkingDataflowOperatorActor.access$001(ForkingDataflowOperatorActor.java:35)
        at groovyx.gpars.dataflow.operator.ForkingDataflowOperatorActor$1.run(ForkingDataflowOperatorActor.java:58)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Jul-11 15:29:07.565 [Actor Thread 3] DEBUG nextflow.Session - Session aborted -- Cause: Gateway Time-out (Service: Amazon S3; Status Code: 504; Error Code: 504 Gateway Time-out
; Request ID: null; S3 Extended Request ID: null)
Jul-11 15:29:07.577 [Actor Thread 3] DEBUG nextflow.Session - The following nodes are still active:
[process] foo
  status=ACTIVE
  port 0: (value) -   ; channel: obj
  port 1: (cntrl) CLOSED; channel: $

Jul-11 15:29:07.579 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: local)
Jul-11 15:29:07.580 [main] DEBUG nextflow.Session - Session await > all process finished
Jul-11 15:29:07.580 [main] DEBUG nextflow.Session - Session await > all barriers passed
Jul-11 15:29:07.587 [main] DEBUG nextflow.trace.StatsObserver - Workflow completed > WorkflowStats[succeedCount=0; failedCount=0; ignoredCount=0; cachedCount=0; succeedDuration=0ms; failedDuration=0ms; cachedDuration=0ms]
Jul-11 15:29:07.592 [main] DEBUG nextflow.CacheDB - Closing CacheDB done
Jul-11 15:29:07.594 [main] DEBUG nextflow.Session - AWS S3 uploader shutdown
Jul-11 15:29:07.602 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye

s3.nf

#!/usr/bin/env nextflow
/*
 * Copyright (c) 2013-2018, Centre for Genomic Regulation (CRG).
 * Copyright (c) 2013-2018, Paolo Di Tommaso and the respective authors.
 *
 *   This file is part of 'Nextflow'.
 *
 *   Nextflow is free software: you can redistribute it and/or modify
 *   it under the terms of the GNU General Public License as published by
 *   the Free Software Foundation, either version 3 of the License, or
 *   (at your option) any later version.
 *
 *   Nextflow is distributed in the hope that it will be useful,
 *   but WITHOUT ANY WARRANTY; without even the implied warranty of
 *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *   GNU General Public License for more details.
 *
 *   You should have received a copy of the GNU General Public License
 *   along with Nextflow.  If not, see <http://www.gnu.org/licenses/>.
 */

 /*
 * Run a process using a S3 file as input
  */

s3file = file('s3://nextflow/test.txt')

process foo {
  echo true
  input:
  file(obj) from s3file

  """
  cat $obj | head
  """
}

nextflow.config:

profiles {

standard {
  aws {

    accessKey = 'f12****'
    secretKey = '0c7****'

    client {
      protocol = 'HTTPS'
      connectionTimeout = '2000'
      endpoint = 'https://openstack.cebitec.uni-bielefeld.de:8080'
      signerOverride = 'AWSS3V4SignerType'
    }
  }
}
}

Working scripts

Tests with a Minio-Client, s3curl.pl or a Java test file from aws-java-sdk-1.11.365 worked fine.

S3Sample.java:

import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;

import com.amazonaws.AmazonClientException;
import com.amazonaws.AmazonServiceException;
import com.amazonaws.ClientConfiguration;
import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.client.builder.AwsClientBuilder;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.model.GetObjectRequest;
import com.amazonaws.services.s3.model.PutObjectRequest;
import com.amazonaws.services.s3.model.S3Object;

public class S3Sample {
    private static String bucketName = "nextflow";
    private static String keyName = "hosts";
    private static String uploadFileName = "/etc/hosts";

    public static void main(String[] args) throws IOException {
        AWSCredentials credentials = new BasicAWSCredentials("f12*********", "0c7*********");
        ClientConfiguration clientConfiguration = new ClientConfiguration();
        clientConfiguration.setSignerOverride("AWSS3V4SignerType");

        AmazonS3 s3Client = AmazonS3ClientBuilder
                .standard()
                .withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration("https://openstack.cebitec.uni-bielefeld.de:8080",""))
                .withPathStyleAccessEnabled(true)
                .withClientConfiguration(clientConfiguration)
                .withCredentials(new AWSStaticCredentialsProvider(credentials))
                .build();

        try {
            System.out.println("Uploading a new object to S3 from a file\n");
            File file = new File(uploadFileName);
            // Upload file
            s3Client.putObject(new PutObjectRequest(bucketName, keyName, file));

            // Download file
            GetObjectRequest rangeObjectRequest = new GetObjectRequest(bucketName, keyName);
            S3Object objectPortion = s3Client.getObject(rangeObjectRequest);
            System.out.println("Printing bytes retrieved:");
displayTextInputStream(objectPortion.getObjectContent());
        } catch (AmazonServiceException ase) {
            System.out.println("Caught an AmazonServiceException, which " + "means your request made it "
                    + "to Amazon S3, but was rejected with an error response" + " for some reason.");
            System.out.println("Error Message:    " + ase.getMessage());
            System.out.println("HTTP Status Code: " + ase.getStatusCode());
            System.out.println("AWS Error Code:   " + ase.getErrorCode());
            System.out.println("Error Type:       " + ase.getErrorType());
            System.out.println("Request ID:       " + ase.getRequestId());

        } catch (AmazonClientException ace) {
            System.out.println("Caught an AmazonClientException, which " + "means the client encountered " + "an internal error while trying to "
                    + "communicate with S3, " + "such as not being able to access the network.");
            System.out.println("Error Message: " + ace.getMessage());

        }

    }

    private static void displayTextInputStream(InputStream input) throws IOException {
        // Read one text line at a time and display.
        BufferedReader reader = new BufferedReader(new InputStreamReader(input));
        while (true) {
            String line = reader.readLine();
            if (line == null)
                break;

            System.out.println("    " + line);
        }
        System.out.println();
    }
}

Environment

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

pbelmann commented 4 years ago

I can confirm that the above example provided by @bosterholz works with s_3_path_style_access = true setting in nextflow config. However there is still the issue that S3 can not be used as work directory:

~/projects/nextflowtest> nextflow run nextflow.nf  -w s3://nextflow/                                                                                                                   
N E X T F L O W  ~  version 20.04.1
Launching `nextflow.nf` [distraught_elion] - revision: 0f7cb73c3f
WARN: Local executor only supports default file system -- Check work directory: s3://nextflow/
executor >  local (1)
[1d/24d861] process > splitLetters   [100%] 1 of 1, failed: 1 ✘
[-        ] process > convertToUpper -
Error executing process > 'splitLetters'

Caused by:
  java.lang.UnsupportedOperationException

Command executed:

  printf 'Hello world!' | split -b 6 - chunk_

Command exit status:
  -

Command output:
  (empty)

Work dir:
  s3://nextflow/1d/24d8619eac0b78ebc0bd7780149266

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

The work directory structure was created in S3, containing the .command.run and .command.sh files.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.