soto-project / soto

Swift SDK for AWS that works on Linux, macOS and iOS
https://soto.codes
Apache License 2.0
880 stars 83 forks source link

Repeated MultiPartResume Causes Crash #518

Closed thatMacAdmin closed 3 years ago

thatMacAdmin commented 3 years ago

Describe the bug Using code like this to resume a multipart upload on failure can cause a crash if there are a large number of resume / retries:

    @discardableResult func resume(_ resumeRequest: S3.ResumeMultipartUploadRequest,
                                   file: URL, s3: S3, on eventLoop: EventLoop,
                                   closure: @escaping (Double) -> Void) -> EventLoopFuture<S3.CompleteMultipartUploadOutput> {

        let promise = eventLoop.makePromise(of: S3.CompleteMultipartUploadOutput.self)

        func _resume(_ resumeRequest: S3.ResumeMultipartUploadRequest) {

            self.logger.debug("Upload was interrupted. Checking if user cancelled upload")
            if self.cancelUpload {
                return promise.fail(MyError.userCancelledUpload)
            }

            s3.resumeMultipartUpload(resumeRequest, filename: file.path, abortOnFail: false) {
                closure($0)
            }
            .whenComplete { result in

                switch result {
                case .failure(S3ErrorType.multipart.abortedUpload(let resumeRequest, _)):
                    _resume(resumeRequest)

                case .failure(let error):
                    promise.fail(error)
                    self.shutdownClient(client: s3.client)

                case .success(let response):
                    promise.succeed(response)
                    self.shutdownClient(client: s3.client)
                }
            }
        }

        _resume(resumeRequest)
        return promise.futureResult
    }

    func upload(_ file: URL, prefix: String, closure: @escaping (Result<Double, Error>) -> Void) {
        self.logger.debug("Starting object upload to mercury")
        guard let credentials = self.retrieveCredentials() else {
            self.logger.error("Invalid credentials")
            return
        }
        let s3 = self.createS3Client(accessKey: credentials.username, secretKey: credentials.password)
        let request = S3.CreateMultipartUploadRequest(bucket: self.bucketList[self.environment] ?? String(),
                                                      key: "\(prefix)/\(file.lastPathComponent)")

        s3.multipartUpload(
            request,
            filename: file.path,
            abortOnFail: false
        ) {
            if self.cancelUpload == true {
                self.logger.info("User cancelled upload")
                throw MyError.userCancelledUpload
            }
            closure(.success($0))
        }
        .whenComplete { result in
            switch result {
            case .success:
                self.logger.info("Upload succeeded")
                closure(.success(1.0))
                s3.client.shutdown(queue: .main) { shutdown in }

            case .failure(S3ErrorType.multipart.abortedUpload(let resumeRequest, _)):
                if !self.cancelUpload {
                    self.logger.debug("Initial upload attempt failed. Trying again...")
                    self.resume(resumeRequest, file: file, s3: s3, on: s3.eventLoopGroup.next()) {
                        closure(.success($0))
                    }
                } else {
                    self.shutdownClient(client: s3.client)
                    closure(.failure(MyError.userCancelledUpload))
                }

            case .failure(let error):
                self.logger.debug("Hit an unknown error: \(error.localizedDescription)")
                self.shutdownClient(client: s3.client)
                closure(.failure(error))
            }
        }
    }

To Reproduce Steps to reproduce the behavior:

  1. Use the code above and start a large multipart upload
  2. Use network line conditioner or other tool to simulate poor network conditions
  3. Hit 29 or 30 retries in my experience this is enough to cause a crash

Expected behavior SDK should continue to upload parts until complete

Setup (please complete the following information):

Additional context Code fails in this block: Line 684 - S3+multipart_API.swift if let part = parts.first(where: { $0.partNumber == partNumber }) {

ERROR: Thread 106: EXC_BAD_ACCESS (code=2, address=0x30fcfcfd0)

See attached backtrace. backtrace.txt

adam-fowler commented 3 years ago

Ah you caught a situation I didn't realise could happen. The crash is a stack overflow. This is caused by the fact that when you call .map of an already succeeded EventLoopFuture the closure is called immediately inside the map function. Your crash involved having 119 calls to map on already successful EventLoopFutures which caused a massive recursive function.

I have a fix for this #519. Could you test this and see if it fixes your issue. Thank you

thatMacAdmin commented 3 years ago

Looks like https://github.com/soto-project/soto/pull/519 resolves this.