psycopg / psycopg

New generation PostgreSQL database adapter for the Python programming language
https://www.psycopg.org/psycopg3/
GNU Lesser General Public License v3.0
1.7k stars 160 forks source link

Trio support #29

Open jab opened 3 years ago

jab commented 3 years ago

From https://www.psycopg.org/psycopg3/docs/async.html it looks like the async support is tightly coupled to asyncio. Would you be willing to use anyio so that Trio users could also use psycopg3 async?

Thanks for your consideration and for your work on psycopg3!

dvarrazzo commented 3 years ago

Of course non-blocking psycopg3 is built on asyncio: asyncio is the primitive Python standard way to do write concurrent I/O bound code. Why it should be tightly coupled to anything else?

I tried anyio a few months ago: I found it buggy, not up to the task, and adding quite some overhead. Surely not ready for its prime, definitely not a foundation to build on.

Just this morning, in order to answer somebody else's email, I took a look at Trio and I just can't understand what is its point. It claims to allow to call asyncio code from sync code. I struggle to understand this use case; however, if Trio does allow to call asyncio code, why can't it call asyncio psycopg3? If people use trio because they don't like async/await keywords, why don't they use multithread- or greenthread- libraries?

psycopg3 supports blocking/multithread code, can be monkeypatched by eventled and gevent, and supports asyncio: does it really need anything on top of it?

Furthermore psycopg3 is built around a core of non-blocking functionalities, implemented as generators, yielding when it's time to block and maintaining the objects state (BaseConnection, BaseCursor...): on top of them the public interfaces are built in a blocking variant (Connection, Cursor) and an asyncio variant (AsyncConnection, AsyncCursor): which are just thin layer describing how to block. From my reply this morning, they may look this way:

class BaseCursor:
    def _execute_gen(self, query, params) -> PQgen[Resutl]:
        # a lot of work, yielding on wait

class Cursor(BaseCursor):
    def execute(self, query, params): -> Result
        with self._lock: # a sync lock
            return self._wait(self._execute_gen(query, params))  # wait based on epoll

class AsyncCursor(BaseCursor):
    async def execute(self, query, params) -> Result:
        async with self._lock:  # an async lock
            return await self._wait(self._execute_gen(query, params))  # wait based on the asyncio loop

Pretty much all of psycopg3 interaction with the network is really a bunch of non-blocking generators and state machines; the Cursor/AsyncCursor classes are like 100 lines of code each (and this code is 80 columns: most methods would be 3 lines long). If people would like to support a different way of coordinate tasks at the time of wait, another layer can be added (TrioConnection, TrioCursor, if you want...) The Base objects don't depend on thread or on asyncio: whenever they should wait, they yield. I'd die to see a TwistedConnection, just to say...

So, I don't preclude a Trio-flavoured variant of psycopg3: if anyone wants to create one they are welcome to base them on psycopg3 base objects and I would be happy to provide guidance. I will not work on it for free: I have plenty to do already to finish the project with the core paradigms and I am not available to support Trio without a sponsorship.

As a final feedback: I really struggle to understand what Trio is for? It goes to great length to explain how to do everything, but I just don't get the why. It just seems a huge construction on top of someone's dislike for the await keyword: I think they should do a better job to explain why someone should use Trio.

jab commented 3 years ago

Sorry I assumed a lot more shared context than I should have! Have you seen https://youtu.be/oLkfnc_UMcE or https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/? I think they do a great job of explaining why you should use Trio / structured concurrency rather than asyncio / unstructured concurrency.

jab commented 3 years ago

Oops, didn’t mean to close, reopening.

jab commented 3 years ago

And if you’re still skeptical, try implementing Happy Eyeballs with asyncio and then try with Trio, making sure to correctly handle all the edge cases. I’ve never seen this exercise fail to produce an “aha!” moment. Hope it does for you!

shamrin commented 3 years ago

As a final feedback: I really struggle to understand what Trio is for? It goes to great length to explain how to do everything, but I just don't get the why. It just seems a huge construction on top of someone's dislike for the await keyword: I think they should do a better job to explain why someone should use Trio.

@dvarrazzo, what resource gave you an idea that Trio doesn't use await keyword? It could help us improve Trio documentation.

Trio is async/await-based, just as asyncio. More on Trio's purpose: https://trio.readthedocs.io/en/stable/index.html

dvarrazzo commented 3 years ago

I have watched the video above, and yes, trio does things to coordinate tasks. Psycopg3 is a task that will not oppose to be coordinated by trio. What else should it do?

@shamarin: I didn't say that Trio doesn't use await, just that the dislike of it is perceivable (from the tutorial), and no, it is not clear from the page you link in what way trio is a better paradigm than multithreading or asyncio.

I invite you to read the conversation, implementation proof of concept, problem founds in https://github.com/psycopg/psycopg2/issues/1057. At the time anyio wasn't good enough for psycopg3 because it didn't allow to wait for RW-ready and the libpq non-blocking functions require it. So that's an end dead enough to consider anyio non compatible with PostgreSQL Be-Fe protocol.

I don't know if the authors has since improved the library: in case they did, it is a simple matter to implement an AnyioConnection, using an anyio lock and implement a waiting.wait_anyio() function to coordinate a psycopg3 connection with other concurrent tasks, and they have my blessing. Happy to do it myself if I get paid to do it.

dlax commented 2 years ago

I have started working on this topic. Here are the steps I imagine:

  1. port code from psycopg/ directory to use anyio instead of asyncio, almost done in #146
  2. port code from psycopg_pool/ directory
  3. port the test suite to use anyio and run it with both asyncio and trio backends

By not touching the test suite while doing steps 1 and 2, I think we would be confident that nothing breaks. And then port the test suite.

As far as I can tell, the code from psycopg_pool would need more significant changes (behavior changes, in fact) because we'll need to move the logic of worker and tasks startup (resp. stop) from __init__() (resp. __del__()) to __aenter__() (resp. __aexit__()). That's more work, but that's doable. On the other hand, it makes sense to handle these resources at context manager level, rather than object initialization, in my opinion; so this might be a good move anyways.

To be discussed...

dvarrazzo commented 2 years ago

That sounds correct: thank you for you work @dlax. The core part of psycopg would mostly work once the generator/wait work nicely together. The pool is more a machinery of its own and the thread/asyncio codebases have less in common.

I am still not entirely sold that the burden of maintaining trio should be on the psycopg side, for the simple reason that I have zero experience with it and I don't like to take responsibilities in fields I'm not competent.

parity3 commented 2 years ago

Sorry for the potential lack of value-add with this comment, but I think adding anyio/trio support will be a huge win for the trio community and using anyio instead of asyncio in core makes me feel better about the structure/flow of this library without even looking at it (opinion!).

Is there a way I can use $$$ to sway people this issue is important? I'm not familiar with using patreon or gitpay or other kinds of bounties and I don't know what this project's policy is on that. I certainly would not want to sway this project in a direction it wasn't meant for, or towards interests that are my own only.

dlax commented 2 years ago

The approach taken in #146 is rather to introduce anyio support as a way to allow usage of trio, this does not replace the asyncio implementation (though it's obviously possible to use the anyio implementation with asyncio backend), so this is conservative. The idea was to see how it works as is and maybe eventually switch the asyncio implementation to use anyio under the hood later on.

In the meantime, I have been looking at how the ecosystem handles the variety of async environments. One interesting case, because it is quite similar in purpose (a network client library), is httpx. So httpx (through httpcore) supports asyncio and trio async environments. Interestingly, while support for trio is native, support for asyncio is done through anyio! This has changed last year, see https://github.com/encode/httpcore/issues/344 for details. Also interesting is that there is only one interface (httpcore.AsyncConnectionPool) and that the backend to use is automatically detected.

Anyways, I'm still committed to work on this feature, continuing #146; but we need a consensus with @dvarrazzo before I invest more time.

dvarrazzo commented 2 years ago

@parity3 I am not opposed, but currently not directly involved, in trio support. I have some catch up to do with @dlax (mostly about other tasks, but we will include this too).

You are free to fund the project and to require that certain feature get prioritised. Should it happen, I will make sure that such support will reach the people who have contributed to the feature the most.

parity3 commented 2 years ago

I'm going to ask around on how best to set this up. Should I just try github sponsors? I don't want to be in a formal creditors list anywhere (although this issue is public). This is a personal contribution (not large by most standards). @dlax re:anyio integration, there has been talk about whether anyio is ready or too in-flux to be considered as the "only" core, and I totally get that. But I'm sure integration work here could help move the anyio project along as well (although it should be getting a fair amount of reliability from the fastapi project integration by now).

gnat commented 2 years ago

3.11 asyncio will get trio-like Task Groups and Exception Groups image

https://mobile.twitter.com/1st1/status/1493748843430567942

https://realpython.com/python311-exception-groups/

https://bugs.python.org/issue46752

There's not really much point in Trio moving forward.

GalaxySnail commented 2 years ago

3.11 asyncio will get trio-like Task Groups and Exception Groups

There's not really much point in Trio moving forward.

No, the key point of structured concurrency is to completely remove “go statements”, in asyncio it is asyncio.create_task, which breaks control flow abstractions. The blog memtioned here has talked about it. AFAIK because of backward compatibility, asyncio.create_task may never be removed in the future. Asyncio will get taskgroups, great, but it can't replace trio.

On the other hand, trio's API is much more simpler than asyncio, which makes trio easier to use.

saolof commented 1 year ago

I think the main point is not as much "use Trio" as "don't use asyncio.create_task and be cancellation-friendly". You can avoid it in a vanilla asyncio implementation by having 3.11 as a minimum version, otherwise using anyio will let you do that in a way that is backwards compatible with the python versions that most people still use and with alternative async runtimes, while being basically zero-overhead in 3.11 and up where its paradigm has native support in asyncio.