upstash / redis-py

HTTP based Python Redis Client for Serverless and Edge Functions
https://docs.upstash.com/redis
MIT License
29 stars 3 forks source link

asyncio #4

Closed chronark closed 1 year ago

chronark commented 1 year ago
  • Asyncio is not dominant yet. Providing an API only for it might hurt the adoption of the library. I think we should do what most of the other libraries do, and provide sync + async APIs on the same package, and the default implementation should have sync APIs: Basically, upstash_redis.Redis and upstash_redis.aio.Redis, which provides the sync and async versions of the same API.
chronark commented 1 year ago

I thought this was internal stuff and not something the user sees? Maybe misunderstood.

we're building at the edge here and I think targeting early adopters is fine, but I don't know python well enough to make a good call here.

is asyncio becoming the new default? or will it live alongside the current thing?

TudorZgimbau commented 1 year ago

This is highly opinionated, but I'll make the case for asyncio:

The basic idea is that asyncio provides a relatively easy way to build asynchronous functionality directly into python code, similar to the way Promises do on javascript. However, the difference is that asyncio is not as widely adopted yet and only supporting it might cause some users to be unable to use the library.

Now, let me explain why upstash_redis is based on it:

chronark commented 1 year ago

Isn't it synchronous if I add await in front of it? why do I need 2 different ways if it's so trivial to go from async -> sync?

mdumandag commented 1 year ago

@chronark It affects the public API of the client. It is basically the problem mentioned in this famous article. You can't await async functions within sync functions directly.

As for the other question, I don't think it will be the default anytime soon. It was introduced in Python 3.4, 9 years ago, and possibly due to some inherent problems of it, it didn't get adopted as expected. If you take a look at the sync vs the async versions of the popular HTTP clients, web frameworks, database drivers etc. in which asyncio would be the most useful for, there is a huge difference between the usage statistics, in favor of the sync versions.

I don't agree that running async code within the sync code is necessarily easier. The reverse is also quite easy: https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor

But, from my experience, it is hard to convince people to run async code in sync code base or vice versa, unless they have to.

I don't know much about the serverless environments, but I assume the libraries I shared above are the main building blocks, along with the official SDKs like https://github.com/boto/boto3 and https://github.com/aws/chalice (both of which are sync APIs). If this is the case, looking at the usage statistics above, I believe it would hurt the adoption of the client if we only provide an async API.

TudorZgimbau commented 1 year ago

As I said before, it's a highly opinionated topic. IMO, the problem with simply looking at the stats is the fact that python functioned without the async syntax for enough years to create legacy codebases. Adopting asyncio might still be a fragile topic for enterprises (like AWS, as you mentioned) and the simple fact that their libs are synchronous means a lot of downloads for their synchronous dependencies too.

Although I'm not an expert in the asyncio ecosystem and its relationship to serverless, I'm thinking that a good portion of our users are not enterprises with compatibility problems or long chain of approvals, but startups looking for cutting edge technologies and fast iteration.

I will repeat this so it's clear: I'm not saying, in any way, that we shouldn't have a synchronous client (which btw, I'd make a separate package: why install all of this unnecessary things ?!). What I'm expressing is my preference for the async version and hence the reason it got out first.

Edit: Thanks for pointing out that it's easy to run synchronous code async too. What I was thinking initially was simply how easy is to just await in async functions.

mdumandag commented 1 year ago

Yes, I get that it is an opiniated topic, but I don't think there are any data points that would suggest that making an async client would be better for us, or such APIs are preferred by the serverless users. For the reverse, I think the data I shared above should correlate with the serverless use cases as well, as it is nothing more than a Python function which uses libraries etc. I believe even the official SDK having only sync APıs should make a strong case for this.

If we are going to have one type of API, my vote would be to provide a sync API.

The reason most libraries provide sync and async APIs in the same package is that, it is easier to share code. With a nice abstraction, it should be trivial to have sync and async clients on the same package. For example, take a look at the way redis client works: It just have two versions of the client that inherit from the same/similar mix-ins https://github.com/redis/redis-py/blob/master/redis/client.py#L852, https://github.com/redis/redis-py/blob/master/redis/asyncio/client.py#L84

TudorZgimbau commented 1 year ago

And it's my opinion that data points don't necessarily help here since things like boto3 boost sync libraries downloads without users explicitly preferring this. If you want a more accurate estimation, I'd say we should ask potential users.

I'm not a fan of redis-py and although I understand the idea of keeping the code together, I'm also thinking about separating concerns and reducing bundle sizes. One good example is Motor, the python async driver for MongoDB, which is operated separately from the synchronous PyMongo.

mdumandag commented 1 year ago

Yes, that is also an option we can consider. I don't mind doing that, although keeping two libraries at sync would be problematic. Maybe we can extract the common parts into a seperate package and let sync and async variants depend on it.

TudorZgimbau commented 1 year ago

Yeah, I think this might work, maybe something like upstash_redis_core. We'd probably have to refactor to be able to reuse some things tho.

mdogan commented 1 year ago

Even though I've not any experience with Python and its ecosystem, I'd like to jump into this discussion from API design perspective.

From what I understand by reading above thread, I think we should go with both async and sync API. If any/some dependency on their stack forces people to use sync API, then we should definitely provide the sync signature too. Yes, our users are mostly early adopters & startups, but these days some mid-size companies are jumping into serverless / edge etc wagon too.

I know for the initial release we have/had limited time. That's why I can understand to go with only async API option. But that should not limit us to stay on async always. At least in the subsequent release, we should be able to provide sync signature, without breaking the compatibility. If that's (compat guarantee) not possible, then this is a blocker for us.

How to expose both signatures is a technical topic, either separate packages or single one. Although I prefer the single package for both, because of easier maintainability. I don't think a few KBs of more package size is that much critical.

chronark commented 1 year ago

Thanks for all the valid concerns, this is far trickier than I initially understood.

I think the consensus is that in the future we want to support both sync and async.

I'd also prefer 1 package for both and don't expect bundle size to be an issue. Python runs in serverless functions and not something like v8 isolates where size is critical. Serverless functions like aws lambda are pretty relaxed about total size. Is size the only downside of doing it in one package?

My suggestion is we launch with async as beta, since most of that is ready. The early adopters will likely be happy enough with an async-only sdk and we will work on adding a sync api by refactoring the core logic.

After adding sync, we can announce GA and hopefully have gotten some feedback by then.

Let's add eval as sync api too, then we can offer both sync and async versions in ratelimit-python This lets us figure out how to offer both apis in 1 package and see if it works like we think it does.

It's a good idea to dogfood the redis sdk in ratelimit to check for potential problems early.

TudorZgimbau commented 1 year ago

Thanks for your insights @mdogan! What @chronark said pretty much summarises it, but I'd like to add a few things.

I don't believe we would step on any sort of compatibility issues regarding async vs sync and if you really aim to have both released at the same time, I believe it's doable. The real problem is finding an elegant way to reuse the core logic / commands on both versions, which implies more work and more importantly, a consensus from the maintainers regarding how it'll be done.

That being said, I recommend launching only with async and prioritising the separation of core logic for the next release. Regarding eval, we can make a sync version in the current package, but it'll look kinda ugly since we have to replace the HTTP library too. And we still have to decide how we're going to package ratelimit for both sync and async.

Which brings me to my last point. If we want to ensure full compatibility and launch with only async, I'd let it be the default (what users will get when importing from upstash_redis) and make sync a separate package under the same distribution. I think this might be the opposite of what redis-py does (sync by default), but IMO it better fits our situation and target users.