terricain / aioboto3

Wrapper to use boto3 resources with the aiobotocore async backend
Apache License 2.0
719 stars 74 forks source link

"upload_fileobj" downloads entire file before uploading #273

Closed anton-zelenskiy closed 3 months ago

anton-zelenskiy commented 2 years ago

Description

Hi! When I try to upload stream, stream file uploads entirely in memory. Is there any way to fix this?

Here you can see the results of memory profiler (https://github.com/bloomberg/memray) (file size ~ 88Mb):

Allocations results for test_upload_big_file_between_buckets: πŸ“¦ Total memory allocated: 96.7MiB πŸ“ Total allocations: 372513 πŸ“Š Histogram of allocation sizes: |β–ˆ β–„ | πŸ₯‡ Biggest allocating functions:

There is no such problem in sync version - boto3.

What I Did

async with session.client(
      's3',
      aws_access_key_id=<>,
      aws_secret_access_key=<>,
      region_name=<>,
) as s3_client:
      async with source['Body'] as raw_stream:
            await s3_client.upload_fileobj(
                  raw_stream,
                  Bucket='bucket',
                  Key='filename',
            )
terricain commented 2 years ago

You'll want to tweak these values - https://github.com/terrycain/aioboto3/blob/master/aioboto3/s3/inject.py#L202 Its possible s3transfer now uses different values since I copied them years ago, see if setting the max io queue to 2 does anything?

The chunksize of multipart chunks to upload is 8MB, and it'll read upto 100 of them before pausing, so im not too surprised.

rlindsberg commented 11 months ago

@anton-zelenskiy

Hi! When I try to upload stream, stream file uploads entirely in memory. Is there any way to fix this?

This is not always true. The default settings will load 80MB into memory then upload them. If you are uploading a file less than 80MB then yes it will load the entire file into memory. But for files larger than 80MB, it's a streaming algorithm. So nothing wrong with the code.

terricain commented 3 months ago

The s3.upload_fileobj now mirrors the S3 Transfer implementation more closely. As for the initial issue, this is working as designed.