terricain / aioboto3

Wrapper to use boto3 resources with the aiobotocore async backend
Apache License 2.0
732 stars 75 forks source link

Make DEFAULT_SESSION use be thread-safe #228

Closed ruhan closed 3 years ago

ruhan commented 3 years ago

I have an implementation that uses threads to distribute calls to S3 resources. When using several threads at once, lots of errors emerged.

I figured out that it was because of this:

https://boto3.amazonaws.com/v1/documentation/api/latest/guide/session.html#multithreading-or-multiprocessing-with-sessions

Meaning that Sessions can't be reused by different threads.

To solve this what I did was to change the way we use the global variable DEFAULT_SESSION to make it possible to have one default session for each thread.

sadraque4k commented 3 years ago

+1 for this one!

terricain commented 3 years ago

Considering this is an async project, is there a specific reason for using aioboto3 in multiple threads over multiple tasks in the event loop, or using boto3 in threads?

I don't at the moment see the use case.

ruhan commented 3 years ago

Hey @terrycain ,

Thanks for your answer.

Let me try to explain my case. Suppose you have a task that is both: IO-bound and CPU-bound at the same time. For example, a task that gets lots of data from S3, organizes it in a table and executes a classification algorithm that uses some vector operations on all this data. In a kind of pyseudocode it could be like:

async def task(data_chunk):
    data = await get_data_from_s3_io_bound()
    return execute_complex_vector_operation_cpu_bound(data)

Now, think that you have lots of tasks like this to execute and that they can be carried on in parallel.

In this case, you are going to have something like this in your pyseudocode:

import concurrent

def execute_tasks_in_parallel():
    chunks = range(1000)
    with concurrent.futures.ThreadPoolExecutor(max_workers=MAX_WORKERS) as pool:
        tasks = [ pool.submit(task, chunk) for chunk  in chunks ]

        for task_completed in concurrent.futures.as_completed(tasks):
            do_something_with_task_result(task_completed.result())

Thus, we are going to have different threads trying to access the same DEFAULT_SESSION variable under the hood, and it crashes the execution sometimes, concluding as a scenario where we need a different session for each thread, otherwise, it's going to crash.

I don't know if anyone else is experiencing this something like this, because it can be too specific, but I think thread-safety characteristic is something this library should have as its property.

Feel free to discord at any point or even to clarify whatever you want to, I may be missing some points here because it is the first time I work with aioboto3.

Thanks for the patience in analyzing this PR.

ruhan commented 3 years ago

Hey @terrycain , can you take a look at my answer here?

terricain commented 3 years ago

Ok so these changes have the capability to store infinite copies of the session, especially long after a thread has ended which isn't great for memory usage.

Is there anything stopping you from creating a Session() object yourself and using that instead of relying on the global one.

One of the things I've been contemplating recently is actually dropping the entire global session as it can present problems for frameworks that stop the event loop and create new ones (e.g. aws chalice)

ruhan commented 3 years ago

Hey, I think that the idea of dropping the whole global session is a better one, but we could have the solution I gave merged in the meantime. It is up to you.

On Thu, May 20, 2021 at 4:19 PM Terry Cain @.***> wrote:

Ok so these changes have the capability to store infinite copies of the session, especially long after a thread has ended which isn't great for memory usage.

Is there anything stopping you from creating a Session() object yourself and using that instead of relying on the global one.

One of the things I've been contemplating recently is actually dropping the entire global session as it can present problems for frameworks that stop the event loop and create new ones (e.g. aws chalice)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/terrycain/aioboto3/pull/228#issuecomment-845409466, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABS7Q7BHCRHG6MYFY3KVX3TOVOCZANCNFSM42TFHLDQ .

terricain commented 3 years ago

9.0.0 is out, there is no default session anymore.

ruhan commented 3 years ago

Awesome, thanks!

On Sun, Jun 27, 2021, 3:44 PM Terry Cain @.***> wrote:

9.0.0 is out, there is no default session anymore.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/terrycain/aioboto3/pull/228#issuecomment-869208096, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABS7Q5CROHVRSX4YBAASHLTU5WPHANCNFSM42TFHLDQ .