pynamodb / PynamoDB

A pythonic interface to Amazon's DynamoDB
http://pynamodb.readthedocs.io
MIT License
2.46k stars 427 forks source link

Async python support: aiopynamodb #802

Open kamadorueda opened 4 years ago

kamadorueda commented 4 years ago

https://github.com/aio-libs/aiobotocore https://github.com/terrycain/aioboto3

garrettheel commented 4 years ago

The discussion here is relevant: https://github.com/pynamodb/PynamoDB/issues/525#issuecomment-607892448

I'd like to support asyncio natively in the library, but I'm still a little hesitant to adopt aiobotocore right as it's not maintained by AWS. We don't rely on all that much of botocore right now, so one option would be to drop that altogether and provide a separate async interface

dwatkinsweb commented 4 years ago

Any idea when this might happen? We could really use this feature right now. I've been attempting to do this myself but I've been having to duplicate a lot of your code for a few small changes.

kamadorueda commented 4 years ago

There is another approach that is used by many libraries out there (keep reading for examples):

When a library exposes a high-latency function, for instance:

for item in TestModel.view_index.query(1):
    print("Item queried from index: {0}".format(item))

One can wrap the calls in a sub-thread via loop.run_in_executor.

Since that's is a little verbose there are nice libraries to make it human-friendly, for example aioextensions

So the syntax would be something like:

from aioextensions import in_thread

for item in await in_thread(TestModel.view_index.query, 1):
    print("Item queried from index: {0}".format(item))

Which would run the high-latency thing in a sub-thread that allows for concurrency.

It's a very minimalistic interface and requires no work from pynamodb since it's on the consumer side to do the wrapping:

from aioextensions import in_thread, collect

# Equivalent to pynamodb_func(arg_1, arg_2, kwarg_a=3, kwarg_b=4)
one_query = await in_thread(pynamodb_func, arg_1, arg_2, kwarg_a=3, kwarg_b=4)

# Equivalent to pynamodb_func(arg_1, arg_2, kwarg_a=3, kwarg_b=4) but all queries concurrently (overlapping in time) and fast!!
many_queries = await collect([
    in_thread(pynamodb_func, arg_1, arg_2, kwarg_a=kwarg_a, kwarg_b=kwarg_b)
    for arg_1, arg_2, kwarg_a, kwarg_b in [long list of things to fetch]
])

There is another alternative and is providing _async versions of the functions, which internally could use the mentioned wrappers hiding them from the final user:

def pynamodb_func(arg_1, arg_2, kwarg_a=3, kwarg_b=4) -> Data:
    ....

async def async_pynamodb_func(arg_1, arg_2, kwarg_a=3, kwarg_b=4) -> Data:
   return await in_thread(pynamodb_func, arg_1, arg_2, kwarg_a=kwarg_a, kwarg_b=kwarg_b)

The library also offers some nice helpers that we could find useful like workers, batching and rate limits.

I think I'm volunteering to implement the async wrappers if you think it's a nice approach, you tell me! @garrettheel

These are examples of the mentioned sub-thread wrapping:

I've personally used it in production and the benefits from concurrency are worth the small overhead it adds to every call

It's common to use a_, async_ or _async notation when both flavors are offered by a library

garrettheel commented 4 years ago

loop.run_in_executor is an interesting approach, but I have tried this before and seen performance issues with high-throughput applications trying this. Introducing threads also introduces new and interesting failure modes that didn't exist before. I'd be concerned about going down that path, especially since the vast majority of users would still use the sync interface and pay that tax

I've been experimenting with a different approach in https://github.com/pynamodb/PynamoDB/pull/853, which could be characterized as a hackier version of the above suggestion (to the benefit of not requiring threads).

brunobelloni commented 1 year ago

Can also be done using asyncio. Will already be prepared for an eventual real async PynamoDB Working on Python 3.9.14+

asyncio.to_thread uses ThreadPoolExecutor under the hood

import asyncio

async def main():
    # Equivalent to pynamodb_func(arg_1, arg_2, kwarg_a=3, kwarg_b=4)
    one_query = await asyncio.to_thread(pynamodb_func, arg_1, arg_2, kwarg_a=3, kwarg_b=4)

    # Equivalent to pynamodb_func(arg_1, arg_2, kwarg_a=3, kwarg_b=4) but all queries concurrently (overlapping in time) and fast!!
    many_queries = await asyncio.gather([
        asyncio.to_thread(pynamodb_func, arg_1, arg_2, kwarg_a=kwarg_a, kwarg_b=kwarg_b)
        for arg_1, arg_2, kwarg_a, kwarg_b in [long list of things to fetch]
    ])

if __name__ == '__main__':
    asyncio.run(main())
aaronclong commented 1 year ago

Would it be possible to create a separate async module in this library and create a similar but async api for people to use?

There are a few of third party async dyanmo/boto3 libraries available for use. It could be used until Amazon finally updates boto3 to support asyncio (😔 cries from botocore maintainer).

I think this approach has a lot of benefits. PynamoDB will have a working async module when boto3 supports it, and if designed correctly, could be swapped out with these third party libs dynamically. Would the maintainer be okay with that?

aaronclong commented 1 year ago

@tasn I notice you tried to do this with threading: https://github.com/pynamodb/PynamoDB/pull/968

abend-arg commented 1 year ago

I am working on a project that we will benefit from adding async support to this package. We will implement our solution basically wrapping everything you have using Gevent. Why Gevent? Because you do not need to worry about async/await syntax, you do not need to rewrite everything defining async methods.

We will probably implement this before June, so as soon as I get some results from it, I will come back with a PR implementing it.

In the meantime, I would really appreciate some feedback providing you with more context. Gevent is great but for example, the support for Windows is limited:

http://www.gevent.org/install.html#supported-platforms

Probably it will narrow the supported Python versions that your library already supports as well.

aaronclong commented 1 year ago

@AbendGithub I think long-term async/await is the future of python, though. Gevent isn't native or widely used by most python programmers.

ikonst commented 1 year ago

We use pynamodb with gevent pretty much everywhere at Lyft without any modifications to this library (with standard gevent monkey-patching).

There's been a lot of community interest in adding an asyncio layer to this library over the years. It's not entirely trivial and will probably result in lots of duplication (seen this in redis-py) which is probably why we haven't yet.

I'd also see it as a negative testimony to the asyncio approach (aka blue/green functions), but this train left the station and most of us are invested into one of those two approaches, so I can definitely see the value in an asyncio layer.

aaronclong commented 1 year ago

Yeah, I know the blue/green function debate is quite polarizing. However, as you said, the language is natively adopting the once approach. Eventually, I feel like even boto3 will be forced to adopt asyncio.

dbfreem commented 2 months ago

Hey just curious if this ever caught traction. I feel like asyncio is one of the easiest ways I find to improve io bound apps.