nextflow-io / nextflow-s3fs

An S3 File System Provider for Java 7 (project archived)
Apache License 2.0
1 stars 10 forks source link

s3 upload failure for large file >1GB #13

Closed sb43 closed 5 years ago

sb43 commented 5 years ago

Bug report

I am getting following error while trying to upload a large (>1GB) file to s3.

Feb-11 23:51:50.487 [pool-4-thread-3] DEBUG com.upplication.s3fs.S3OutputStream - Failed to upload part 3 attempt 1 for bucket: methylseq, key:   somefile.bam versionId: null -- Caused by: Failed to reset the request input stream;  If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)

Expected behavior and actual behavior

(Give an brief description of the expected behavior and actual behavior)

Steps to reproduce the problem

tmpfile = Channel.fromPath('mytestfile)
process setExample {
    publishDir 's3://mybucket/test_s3new', mode: 'copy' , overwrite: true
    input:
    file myfile from tmpfile

    output:
    file myfile into tmp

    """
    echo $myfile
    """

}

Program output

Feb-11 23:51:50.487 [pool-4-thread-3] DEBUG com.upplication.s3fs.S3OutputStream - Failed to upload part 3 attempt 1 for bucket: methylseq, key: test_s3new/26415_8#1_1s.fq_000000.gz_val_1_bismark_bt2_pe.deduplicated.bam, versionId: null -- Caused by: Failed to reset the request input stream;  If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
Feb-11 23:51:51.744 [pool-4-thread-1] DEBUG com.upplication.s3fs.S3OutputStream - Failed to upload part 1 attempt 2 for bucket: methylseq, key: test_s3new/26415_8#1_1s.fq_000000.gz_val_1_bismark_bt2_pe.deduplicated.bam, versionId: null -- Caused by: Failed to reset the request input stream;  If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
Feb-11 23:51:52.036 [pool-4-thread-4] DEBUG com.upplication.s3fs.S3OutputStream - Failed
Feb-11 23:30:05.442 [pool-4-thread-8] ERROR com.upplication.s3fs.S3OutputStream - Upload: 2~zD0KjZq5in0DbQxmC-Qr5_2KlHCFX33 > Error for part: 8
Caused by: java.io.IOException: Failed to upload multipart data to Amazon S3
        at com.upplication.s3fs.S3OutputStream.uploadPart(S3OutputStream.java:439)
        at com.upplication.s3fs.S3OutputStream.access$000(S3OutputStream.java:68)
        at com.upplication.s3fs.S3OutputStream$1.run(S3OutputStream.java:345)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: The target server failed to respond
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1116)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1066)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4365)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4312)
        at com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3338)
        at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3323)
        at com.upplication.s3fs.S3OutputStream.uploadPart(S3OutputStream.java:472)
        at com.upplication.s3fs.S3OutputStream.uploadPart(S3OutputStream.java:434)
        ... 7 more
Caused by: org.apache.http.NoHttpResponseException: The target server failed to respond
        at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
        at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
        at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
        at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
        at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
        at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
        at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82)
        at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)

Environment

Additional context

It always works with smaller files. and with large files using: aws s3 cp

pditommaso commented 5 years ago

Are you using a custom endpoint?

sb43 commented 5 years ago

yes, I am using (Ceph Object Gateway)

pditommaso commented 5 years ago

NF do upload files bigger > 1GB to S3. The stack trace suggests that's a problem with the S3 backend or an incompatibility with the AWS SDK

com.amazonaws.SdkClientException: Unable to execute HTTP request: The target server failed to respond
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1116)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1066)
sb43 commented 5 years ago

If I remove the number of retries then getting following error: is there any paramter in nextflow to set the setReadLimit:

DEBUG com.upplication.s3fs.S3OutputStream - Failed to upload part 1 attempt 1 for bucket: methylseq, key: test_s3new/myfile.bam, versionId: null -- Caused by: Failed to reset the request input stream; If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)

Please see the aws issue . https://github.com/aws/aws-sdk-java/issues/427

mcast commented 5 years ago

Please see the aws issue . aws/aws-sdk-java#427

I'm helping Shriram get to the bottom of this. So far...

Do you want the issue moved to that project?

pditommaso commented 5 years ago

Yes, just moved the issue on nextflow-s3fs project.

pditommaso commented 5 years ago

The upload is done with a byte buffer (wrapper by an InputStream) to avoid to save the chunks into files:

    private void uploadPart(final InputStream content, final long contentLength, final byte[] checksum, final int partNumber, final boolean lastPart)
            throws IOException {

        if (aborted) return;

        final UploadPartRequest request = new UploadPartRequest();
        request.setBucketName(objectId.getBucket());
        request.setKey(objectId.getKey());
        request.setUploadId(uploadId);
        request.setPartNumber(partNumber);
        request.setPartSize(contentLength);
        request.setInputStream(content);
        request.setLastPart(lastPart);
        request.setMd5Digest(Base64.encodeAsString(checksum));

        final PartETag partETag = s3.uploadPart(request).getPartETag();
        log.trace("Uploaded part {} with length {} for {}: {}", partETag.getPartNumber(), contentLength, objectId, partETag.getETag());
        partETags.add(partETag);

    }
sb43 commented 5 years ago

Finally resolved by setting signerOverride = "S3SignerType"

pditommaso commented 5 years ago

Nice, where did you set that?

sb43 commented 5 years ago

https://www.nextflow.io/docs/latest/config.html#config-aws

aws {
  client {
     endpoint = "https://myendpoint.com"
      signerOverride = "S3SignerType"
  }
}
pditommaso commented 5 years ago

Excellent.