redis / redis-py

Redis Python client
MIT License
12.6k stars 2.52k forks source link

Redis Sans-IO for Dual Async and Sync Support. #2039

Closed seandstewart closed 3 weeks ago

seandstewart commented 2 years ago

Hey folks - I've been working hard on the performance issue the currently asyncio implementation faces when compared to https://github.com/aio-libs/aioredis-py/tree/v1.3.1

We're tracking this as part of the meta-issue https://github.com/aio-libs/aioredis-py/issues/1225 and is considered a top priority to encourage folks to adopt the async client. (Here's a direct link to the issue: https://github.com/aio-libs/aioredis-py/issues/1208)

As part of this work, I've also been cognizant of the overall maintenance burden of maintaining two client implementations in a single code-base. Using the principles laid out in https://sans-io.readthedocs.io/, I have taken a stab at implementing the core client logic in a Sans-IO, event-driven model. You can see the work here:

https://github.com/seandstewart/redis-py-sansio

I think this could be a very worthwhile endeavor to take on - it allows the codebase to diverge only in minimal ways for the specific IO implementations, as illustrated in the high-level client & connection modules.

Looking forward to more collaboration!

seandstewart commented 2 years ago

cc: @chayim, @Andrew-Chen-Wang

I'd love to discuss this more, let me know your thoughts 👍

Andrew-Chen-Wang commented 2 years ago

This looks great! Thanks so much for this! @seandstewart


Code-wise:

On an initial look through, it looks like the problem of ordering has been resolved really well. I love the variable docs. It was interesting reading through io module with the Events. A concern I have would be during a connection timeout / kill that was fixed in redis-py.

I'm not entirely sure what this line is waiting for; explanation would be great!

The PythonPaser has these readprimitive methods. Are these the only ones we need?

Nit picks:

Unless I'm not understanding the code properly, returning exceptions is a little... interesting? I'd expect explicit RedisError-inherited exceptions. Such as here: https://github.com/seandstewart/redis-py-sansio/blob/7400a38ae9fa92db7e1164a5731036675f7e3551/redis/sansio/_parser.py#L149 it would be nice to have

class RedisError(Exception):
    def __init__(self, exc=None):
        self.exc = exc
...
except Exception as e: raise RedisError(e)

Coming from a Django background, I'd prefer if things were explicit/verbose. For instance, many people already know what kw in kw_only means, but I'd still prefer the long kwargs, plural.


Organizationally, mostly for @chayim:

I think organizationally, the asyncio module can use the new code base as many people are getting used to the new code anyways + the interface hasn't changed much whereas the sync client would still default to the current implementation in redis-py with an additional option being the Sans IO implementation.


I will take a better look at this sometime in the last week of this month. Thanks again @seandstewart !!!

chayim commented 2 years ago

I really like this approach. Equally I echo explicit versus implicit choices. While personally I prefer implicit, for library and communal use I think explicit, and even verbose is more valuable.

I'd generally like to see the clients themselves become simpler, and rely on fixed features that are then included, and perhaps overridden re-implemented as necessary (clearly timeout is an example). I've been thinking of introducing a ConnectionFactory, as part of this - which in turn would make it easier for developers to discover connection types, though I don't yet understand how that would impact the async world.

@seandstewart First off, loving this. Second - WDYT?

seandstewart commented 2 years ago

@Andrew-Chen-Wang @chayim -

Thanks for taking a look at this! I'm so glad to find us all in this place and I appreciate the positive response to these efforts. I'll try and respond to each question in turn 👍

  1. Missing timeout/ bug-fixes.
    • Yes, the IO implementations are not complete, they’re meant to be a starting place, there are missing features/patches.
    • An important feature of each is they are meant to be “safe” for concurrency primitives (task-safe for asyncio and thread-safe for syncio) without the use of locks. This was a learning from analyzing the aioredis@1.3 implementation, which is, honestly, brilliant.
  2. "What is the Parser implementation 'waiting' for?"
    • The pure-Python parser is implemented using a generator (here be 🐉). The wait...() methods force the generator to yield if there is no data in the parse buffer. This allows the main process to context-switch, which in turn opens other workloads within the process to fill the buffer.
  3. The PythonPaser has these read<primitive>() methods. Are these the only ones we need?
    • Yes, these are the primitives which are implemented by the RESP3 protocol.
  4. Returning Exceptions vs. Raising Exceptions.
    • The PythonReader is implemented to mirror the behavior of hiredis.Reader. This allows our protocol to not care which one is used, which simplifies the implementation.
    • There are three classes of error modes which may occur:
      1. ResponseError: always returned, it’s the parsed value of an ERR response.
      2. InvalidResponse: always raised, the server returned an un-parseable response (likely we’re not talking to a valid Redis server).
      3. Any other exception: always raised. The only reason we catch a bare Exception is to ensure we clear the buffer. We don’t want to wrap the error at this low level, that happens in the Protocol.
      4. Wrapping these errors would introduce a behavioral change between the two readers, which we want to avoid.
  5. “many people already know what kw in kw_only means, but I'd still prefer the long kwargs, plural.”
    • I think this is in reference to the kw_only parameter for attr.define. I don’t have any control over that. FWIW, Python 3.10 adopted this same parameter for its dataclass implementation. I’m not adamant that we use attrs for these data objects, but it makes the definition of slot-based classes much simpler. Once 3.10 is the LCD, I’d advocate we drop the dependency.
  6. Implicit vs. Explicit
    • I prefer explicit, which is what I think (hope) this approach provides - an explicit protocol for encoding and decoding Redis Server & Client events.
  7. ConnectionFairy for discovery of other types of connections.
    • This is interesting - I’d love to learn more.
    • One of the things I learned while working on this was that the class-level differentiation between SSL, UDF, and other connection implementations is they’re unnecessary. Their behavior is implemented without a high level of abstraction or duplication in this approach, and as a benefit, they reduce the burden of consumers of this library. More than anything, these are configuration values to pass onto the socket constructor.
    • When it comes to IO-specific implementations, I think I see a use case, but again, I’d love to learn more. If we do this right then it could be possible to provide not only a sync and async implementation, but a tornado or trio implementation with minimal cost.

Super stoked to be collaborating on this and can’t wait to see where this discussion leads! Hope these responses are helpful.

synodriver commented 2 years ago

Looks like we've done the same thing. I have just write a resp protocol parser in python. The difference is sioresp only contains a parser, without any io-related implementation.

github-actions[bot] commented 1 year ago

This issue is marked stale. It will be closed in 30 days if it is not updated.

Andrew-Chen-Wang commented 1 year ago

This should be re-opened

kylebebak commented 1 year ago

Agreed that this should be reopened, especially if there's still a gap in performance between https://github.com/aio-libs/aioredis-py/tree/v1.3.1 and redis-py

chayim commented 1 year ago

Sorry folks, auto-closed, reopened.

seandstewart commented 1 year ago

I haven't forgotten about this proposal! Things have evolved over the last year, but I would still love to find a path forward with this 🚀

github-actions[bot] commented 5 months ago

This issue is marked stale. It will be closed in 30 days if it is not updated.