matrix-org / synapse

Synapse: Matrix homeserver written in Python/Twisted.
https://matrix-org.github.io/synapse
Apache License 2.0
11.82k stars 2.13k forks source link

Potentially replacing `twisted` with `asyncio`-based code #11777

Open ShadowJonathan opened 2 years ago

ShadowJonathan commented 2 years ago

This is an issue that'll collect all potential technical issues that removing twisted will encounter, and try to provide suitable solutions for those.


Ever since I've seen synapse, I've poked at trying to replace its asynchronous execution framework, twisted, with the standard asyncio library.

However, i've gotten to realise that replacing twisted is a much more involved task.

Specifically, twisted covers four main problem domains;

I was initially mainly focused on the first one, with asyncio replacing twisted's asynchronous primitives. This was a naive approach, though now that I know better, I'll write down my findings on the rest.


I'll first explain each domain, how twisted handles it, and how an asyncio-equivalent could replace it.

Asynchronous Execution

Twisted provides Deferred, which provides callback-based asynchronous support.

It has been sugared over the years to program the same as asyncio's (and python 3.4+'s) coroutine, first via @inlineCallbacks (which provided a way to yield a Deferred to the executor, to resume processing once that deferred has finished, de-callback-ing the function signature, very much like await), and then ultimately also supporting async/await natively.

This all does not do away with Deferred's callback-based nature, and a bunch of hacks exist in the synapse codebase to deal with this in a 'sane' manner.

Enter asyncio, which has three async primitives;

As a rule of thumb, Future is a low-level plumbing object, while Task is used to drive coroutines forward concurrently (and can be used to keep track of these threads of execution).

All are awaitable, which is to say that await is able to be used on them while running in an event loop, and that loop will handle that object.

Specifically, this addresses these major uses of twisted in synapse;

Web Frameworks

Twisted provides an extensive web framework, based around servelets, Resources which can respond to HTTP Requests (which synapse subclasses with SynapseRequest, to add additional details.)

This is by far the most extensive and integrated bit of twisted, which made it somewhat hard to find a suitable replacement.

By far, python has a ton of microservice frameworks, notably all similarly derived from the same idea that flask brings; annotation-based path routing.

I found two libraries which somewhat could fit the niche that synapse requires;

Tornado

Tornado provides a somewhat similar interface like twisted.web, though in my opinion, it has some problems, such as a stagnated release cycle (the last release was 2 years ago), and some crusted API which hasn't fully reconciled with their migration to asyncio.

This makes me recommend tornado less for a look at the following;

aiohttp

aiohttp provides a modest server module with a ton of features.

Most notably, while also adopting the annotation-based routing that flask does, aiohttp does two things which i think help it adapt to synapse's usecase;

First, it allows "Views", these look very similar to applets;

class MyView(web.View):
    async def get(self):
        return await get_resp(self.request)

    async def post(self):
        return await post_resp(self.request)

Second, it allows RouteTableDef, which is an object that collects routes, using annotations which self-document the route it is getting.

routes = web.RouteTableDef()

@routes.get('/get')
async def handle_get(request):
    ...

@routes.post('/post')
async def handle_post(request):
    ...

app.router.add_routes(routes)

This can also be applied to views;

@routes.view("/view")
class MyView(web.View):
    async def get(self):
        ...

    async def post(self):
        ...

Returning this object and collating it all into a generic add_routes on the app router could replace register_servlets.

Possibly, things like nested applications can logically separate large parts of the codebase, such as the admin apis, these could then be re-added as follows; app.add_subapp('/_synapse/admin/', admin).

aiohttp is a plenty old library, but it also finds a niche between simple and involved, I think it'll be maintained for a very long time.

Database Interaction

This is a bit of a tricky one, the main way where synapse touches twisted here is through twisted.enterprise.dbapi, which provides a DB-APIv2 wrapper around blocking calls, automatically placing this on a threadpool.

I've tried to find a replacement, but ultimately came up short, I think that there are a few reasons for this;

Regardless, .dbapi ultimately only provides a thin wrapper around the DB-API actual, scheduling blocking calls on threads.

asyncio largely obsoletes this, by instead inlining the IO into the current event loop.

While this might look like a performance hit on first glance, i think that it might actually become a slight improvement, as asyncio would be able to cooperatively advance execution on every single connection at once every loop, and an alternate IO-optimised loop (such as uvloop) could bring even further improvements.

psycopg3 provides an asyncio interface, which will largely mirror its existing interface, only with async def in front of most operations.

For sqlite, aiosqlite exists, scheduling all calls onto a single thread, allowing the event loop to continue while it churns away.

Integrated Testing

This is the bit i'm not entirely sure sure about yet, excluding the costs of rewriting a large amount of twisted-dominated test code with a pytest-esc equivalent (or the likes), synapse make heavy use of pump, basically time-travelling the reactor to get more results quicker.

I could not find a good equivalent in asyncio, while both asynctest and aiomock exist, both are not maintained.

This is still a somewhat open area of research, though the ecosystem is rich.


Motivation

Of course, all of this is not actionable without any motivation, "why switch from twisted?". The main reasons for staying would be; Stability, Developer Experience, Framework Integration (twisted.web).

All of these are extremely valid reasons, and I'm not rejecting them, this issue is more a gathering of "what if", or "what would it take" to switch to asyncio.

Though still, I think there are some good reasons to switch to asyncio;

Modern Ecosystem

Around the time of python 3.4, asyncio was poised to become the standard-library async framework, while many more existed at the time (Twisted, Tornado, gevent, etc.) concurrently.

This all happened around 2016, asyncio was the new kid on the block, twisted was the established framework, I wasn't around in 2016 to observe this, but I'd bet that for synapse, this choice was very obvious back then. (It also had an integrated asynchronous web framework, which i believe wasn't exactly a given back then)

Now, the situation has (in some definition) turned on its head, where asyncio is the prominent library in the ecosystem, and twisted is more and more becoming an exotic framework in comparison.

This is fine for synapse, but because of this, it cannot take advantage of modern libraries and speedups, such as uvloop (a libuv-based asyncio reactor, see also https://github.com/matrix-org/synapse/issues/10366)

And don't forget to mention the large amount of up-to-date and maintained libraries which can rely on asyncio's large userbase (through python), versus the few libraries which still support twisted through some combination of factors. I don't have exact details, but i believe quite a fair bit of helper libraries have been rejected because they do not have twisted support, only provide asyncio support, and/or are blocking. This could change that.

Future-proofing

Synapse is relatively stable, its now at a stage where it provides a good matrix experience to a sizable portion of the ecosystem.

However, with servers like dendrite and conduit each aiming to take a slice of synapse's share, this may change, and with its relative slowness to compiled code, synapse may become somewhat of a legacy product for matrix.

Twisted does not help with this, as it increasingly becomes alien to many new python developers, who have only known asyncio. This may also introduce bugs, as community contributors could rely on asyncio-like behaviour that twisted does not provide, which adds maintenance overhead.

It also doesn't attract community developers to really work with synapse, as most of its codebase is intricately intertwined with twisted's framework, which can be a daunting task to learn, work with, interact with, and then subsequently submit patches for. For asyncio, they could possibly already have pre-existing knowledge about its behaviour, and certain libraries.

Ultimately, I think that asyncio will outlast twisted, due to it being co-developed with python, and twisted being a very old library with many alternatives.

Unseen benefits

Furthermore, integrating with asyncio might bring unseen benefits.

The first one that i know of is speed, asyncio has some "C-accelerated" parts (such as Future, Task, and some other primitives.) which could already provide a speed boost compared to twisted.

This can be extended with uvloop, which (as described above) provides a libuv-based event loop optimised for heavy IO handling.

Secondly, with pyo3-asyncio, it is possible to write extensions in Rust that accelerate synapse considerably, and are exposed to python as normal awaitables. These would run on a separate rust event loop thread, providing actual access to multithreading. (Whereas the GIL prohibits a lot of concurrent execution, even with DB adapters often enough).


Acknowledging the maintenance cost

Of course, this all isn't as simple as the technical reasoning above makes it sound; there is a huge up-front investment and maintenance cost associated with this, first with the rewriting effort itself, then with adjusting to the new framework(s), and lastly maintainer retraining/teaching/learning, etc.

Only in the long run do the results really pay off, where synapse can become simpler to maintain and easier to be contributed to by community members, and potentially find new maintainers for.

The question (of course) for matrix-org (or element, who currently are synapse's maintainers) is if this is worth it.

Some reasoning for this could be visions for the future of synapse, a personal one (and one that I've had since i saw the project) would be that synapse is a testbed for MSCs and other experiments, as python trades strong type guarantees and speed for developer happiness.

As Synapse is today, with it being a rather centre piece of "Matrix", with it still being the largest server, forgoing stability for flexibility and future-proofing like this wouldn't make much sense, but this situation/cost-benefit calculation might (or could) change, and that's what this issue is for.

callahad commented 2 years ago

This is a really thorough summary; thank you for writing it up. I'm sympathetic to and in agreement with most of the points you raise, especially the broader shift in the community to asyncio.

There's a relevant quote from the second edition of Code Complete:

Standard Techniques The more a system relies on exotic pieces, the more intimidating it will be for someone trying to understand it for the first time. Try to give the whole system a familiar feeling by using standardized, common approaches.

Asyncio is the standardized, common approach. And Twisted has always been exotic.

So we're absolutely paying a cost there. However, we've mitigated much of the unfamiliarity through our pervasive use of standard async/await syntax, and we could likely achieve decent ecosystem integration by switching to Twisted's asyncioreactor. Those are tractable goals.

What's harder to justify at this moment in time is a wholesale switch to asyncio. There is consensus on the team that, if we had unlimited time, we would benefit from moving away from Twisted. But in the reality of constrained time, I do not believe those benefits yet outweigh the costs of moving.

I'm going to sleep on this issue before deciding exactly what to do about it, but it's a great reference regardless.

ShadowJonathan commented 2 years ago

and we could likely achieve decent ecosystem integration by switching to Twisted's asyncioreactor.

I've tried this before, in https://github.com/matrix-org/synapse/issues/8642 I thought that by doing this, asyncio could easily be "intermixed" with twisted, making an iterative process to change over twisted to asyncio possible. However, the wall I ran up against was that they don't intermix easily, every function then became "invisibly coloured" to either run in asyncio or twisted, and explicit conversions between the two had to be made.

This would lead to instability at runtime if one of these conversions were missed, and ultimately, after looking under the hood, i saw that the asyncio reactor did not provide many benefits to twisted code, as twisted did not delegate its async execution to it, only rudimentary IO, creating the aforementioned intermixing problem.

It is an approach, but a painful one, and my goal at the time was seamless and invisible transitioning, so I dropped it.