terricain / aioboto3

Wrapper to use boto3 resources with the aiobotocore async backend
Apache License 2.0
698 stars 73 forks source link

Since 13.0.0 async streaming uploads don't read the whole stream. #340

Closed szymonzmilczakpandadoc closed 1 month ago

szymonzmilczakpandadoc commented 1 month ago

Description

I'm trying to stream download file from url and stream upload it to S3. In 12.4.0 this code works fine. In 13.0.0 it uploads only first couple hundred of bytes.

What I Did

import asyncio
import aiohttp
import aioboto3

async def main():
    url = "https://pdfobject.com/pdf/sample.pdf"  # size: 18810 bytes
    boto_session = aioboto3.Session()
    http_session = aiohttp.ClientSession()
    response = await http_session.get(url)
    async with boto_session.client("s3") as s3:
        await s3.upload_fileobj(
            Fileobj=response.content, Bucket="bucket", Key="sample.pdf"
        )
        # uploads only about 899 first bytes

asyncio.run(main())
terricain commented 1 month ago

Yeah something is wrong here, am having a look.

terricain commented 1 month ago

Ok I've fixed it, will release it later tonight as I want to introduce some tests that exercise this behaviour.

What happens is there was a naive assumption that .read(num_bytes) would return at most num_bytes, but if it was less, that was all that would be received. This is very much not the case as b'' would be returned by .read(...) if there was nothing left to consume. So with a aiohttp stream returning less bytes than the multipart threshold, it then proceeded to take the quick path of a singular .put_object(...) instead of the multipart dance. Is a simple enough fix to loop and consume enough data until either EOF or the multipart threshold is reached and then continue.

terricain commented 1 month ago

This should be fixed in v13.0.1

szymonzmilczakpandadoc commented 1 month ago

Thank you! I'll test once you publish 13.0.1.

terricain commented 1 month ago

Its out :)