soto-project / soto

Swift SDK for AWS that works on Linux, macOS and iOS
https://soto.codes
Apache License 2.0
868 stars 81 forks source link

How to Optimizing AWS S3 File Downloads for Multiple Objects in a Single Request #680

Closed arenas7307979 closed 1 year ago

arenas7307979 commented 1 year ago

Description: I'm always frustrated when downloading multiple images from a specific folder in AWS S3 because the current implementation in my code uses one request per image, resulting in significant request consumption. I am looking for a better way to download multiple images efficiently within a single request.

Solution: I would like to optimize the AWSdownloadFileFromS3 method to support downloading multiple objects in a more efficient manner. Ideally, I would like to download multiple objects using a single request to reduce the overall request consumption.

Alternatives: I have considered exploring different approaches or techniques to achieve the efficient downloading of multiple objects from AWS S3 using a single request.

Additional Context: N/A

Code:

func AWSdownloadFileFromS3(s3: S3, urlString: String) -> EventLoopFuture<Data> {
    guard let url = URL(string: urlString),
          let host = url.host,
          let bucket = host.components(separatedBy: ".").first else {
        fatalError("Invalid URL")
    }

    let key = String(url.path.dropFirst())
    let runOnEventLoop = s3.client.eventLoopGroup.next()

    var byteBufferCollate = ByteBufferAllocator().buffer(capacity: 0)

    let getObjectRequest = S3.GetObjectRequest(bucket: bucket, key: key)
    let getObjectFuture = s3.getObjectStreaming(getObjectRequest, on: runOnEventLoop) { byteBuffer, eventLoop in
        var byteBuffer = byteBuffer
        byteBufferCollate.writeBuffer(&byteBuffer)
        return eventLoop.makeSucceededFuture(())
    }

    let dataFuture = getObjectFuture.flatMap { _ -> EventLoopFuture<Data> in
        if byteBufferCollate.readableBytes > 0 {
            guard let data = byteBufferCollate.readData(length: byteBufferCollate.readableBytes) else {
                return s3.client.eventLoopGroup.next().makeFailedFuture(MyError.emptyFile)
            }
            return s3.client.eventLoopGroup.next().makeSucceededFuture(data)
        } else {
            return s3.client.eventLoopGroup.next().makeFailedFuture(MyError.emptyFile)
        }
    }

    return dataFuture
}
adam-fowler commented 1 year ago

Are you looking to use one S3 request to download multlple files, or a single function (which may make multiple requests to S3)?

I don't know of any way to do the first option. The second is easier. There is a project https://github.com/soto-project/soto-s3-file-transfer which can be used to download multiple files concurrently to your filesystem. Also if you use the new swift concurrency versions of the S3 APIs it is fairly easy to download multiple files concurrently using TaskGroups