Open ShadowJonathan opened 2 years ago
This is a really thorough summary; thank you for writing it up. I'm sympathetic to and in agreement with most of the points you raise, especially the broader shift in the community to asyncio.
There's a relevant quote from the second edition of Code Complete:
Standard Techniques The more a system relies on exotic pieces, the more intimidating it will be for someone trying to understand it for the first time. Try to give the whole system a familiar feeling by using standardized, common approaches.
Asyncio is the standardized, common approach. And Twisted has always been exotic.
So we're absolutely paying a cost there. However, we've mitigated much of the unfamiliarity through our pervasive use of standard async/await
syntax, and we could likely achieve decent ecosystem integration by switching to Twisted's asyncioreactor
. Those are tractable goals.
What's harder to justify at this moment in time is a wholesale switch to asyncio. There is consensus on the team that, if we had unlimited time, we would benefit from moving away from Twisted. But in the reality of constrained time, I do not believe those benefits yet outweigh the costs of moving.
I'm going to sleep on this issue before deciding exactly what to do about it, but it's a great reference regardless.
and we could likely achieve decent ecosystem integration by switching to Twisted's asyncioreactor.
I've tried this before, in https://github.com/matrix-org/synapse/issues/8642 I thought that by doing this, asyncio could easily be "intermixed" with twisted, making an iterative process to change over twisted to asyncio possible. However, the wall I ran up against was that they don't intermix easily, every function then became "invisibly coloured" to either run in asyncio or twisted, and explicit conversions between the two had to be made.
This would lead to instability at runtime if one of these conversions were missed, and ultimately, after looking under the hood, i saw that the asyncio reactor did not provide many benefits to twisted code, as twisted did not delegate its async execution to it, only rudimentary IO, creating the aforementioned intermixing problem.
It is an approach, but a painful one, and my goal at the time was seamless and invisible transitioning, so I dropped it.
This is an issue that'll collect all potential technical issues that removing
twisted
will encounter, and try to provide suitable solutions for those.Ever since I've seen
synapse
, I've poked at trying to replace its asynchronous execution framework,twisted
, with the standardasyncio
library.However, i've gotten to realise that replacing
twisted
is a much more involved task.Specifically, twisted covers four main problem domains;
Deferred
, mainly, but also by providing IO primitives and such.twisted.web
; Resources, routing, servelets, etc.twisted.enterprise.dbapi
provides a wrapper around any DB-APIv2-compatible database driver, making sure any blocking calls are executed on a threadpool, while returning an awaitable.tests/
I was initially mainly focused on the first one, with asyncio replacing twisted's asynchronous primitives. This was a naive approach, though now that I know better, I'll write down my findings on the rest.
I'll first explain each domain, how twisted handles it, and how an asyncio-equivalent could replace it.
Asynchronous Execution
Twisted provides
Deferred
, which provides callback-based asynchronous support.It has been sugared over the years to program the same as asyncio's (and python 3.4+'s)
coroutine
, first via@inlineCallbacks
(which provided a way toyield
aDeferred
to the executor, to resume processing once that deferred has finished, de-callback-ing the function signature, very much likeawait
), and then ultimately also supportingasync
/await
natively.This all does not do away with
Deferred
's callback-based nature, and a bunch of hacks exist in the synapse codebase to deal with this in a 'sane' manner.Enter
asyncio
, which has three async primitives;coroutine
, a language-internal object, essentially a more opinionated generator.asyncio.Future
, a low-level callback-based async primitive, mainly used for IO-based operations, cross-thread execution, and other black magic.asyncio.Task
, an object wrapping any awaitable, this will run on the loop until completion. Any other async thread can await this object at any time to yield until the result is ready. (Even after the task has finished, then it returns immediately).As a rule of thumb,
Future
is a low-level plumbing object, whileTask
is used to drive coroutines forward concurrently (and can be used to keep track of these threads of execution).All are awaitable, which is to say that
await
is able to be used on them while running in an event loop, and that loop will handle that object.Specifically, this addresses these major uses of twisted in synapse;
Reactor
=> Event LoopIDelayedCall
=>Task
+asyncio.sleep()
orloop.call_later
LoopingCall
=>Task
+while True:
Failure
=> simple exception raisingdefer.gatherResults
=>asyncio.gather
Web Frameworks
Twisted provides an extensive web framework, based around servelets,
Resources
which can respond to HTTPRequest
s (which synapse subclasses withSynapseRequest
, to add additional details.)This is by far the most extensive and integrated bit of twisted, which made it somewhat hard to find a suitable replacement.
By far, python has a ton of microservice frameworks, notably all similarly derived from the same idea that flask brings; annotation-based path routing.
I found two libraries which somewhat could fit the niche that synapse requires;
Tornado
Tornado provides a somewhat similar interface like
twisted.web
, though in my opinion, it has some problems, such as a stagnated release cycle (the last release was 2 years ago), and some crusted API which hasn't fully reconciled with their migration to asyncio.This makes me recommend tornado less for a look at the following;
aiohttp
aiohttp
provides a modest server module with a ton of features.Most notably, while also adopting the annotation-based routing that flask does, aiohttp does two things which i think help it adapt to synapse's usecase;
First, it allows "Views", these look very similar to applets;
Second, it allows
RouteTableDef
, which is an object that collects routes, using annotations which self-document the route it is getting.This can also be applied to views;
Returning this object and collating it all into a generic
add_routes
on the app router could replaceregister_servlets
.Possibly, things like nested applications can logically separate large parts of the codebase, such as the admin apis, these could then be re-added as follows;
app.add_subapp('/_synapse/admin/', admin)
.aiohttp
is a plenty old library, but it also finds a niche between simple and involved, I think it'll be maintained for a very long time.Database Interaction
This is a bit of a tricky one, the main way where synapse touches twisted here is through
twisted.enterprise.dbapi
, which provides a DB-APIv2 wrapper around blocking calls, automatically placing this on a threadpool.I've tried to find a replacement, but ultimately came up short, I think that there are a few reasons for this;
twisted.enterprise.dbapi
in the first place. This has largely been obsoleted by asyncio, as it provides IO primitives instead.Regardless,
.dbapi
ultimately only provides a thin wrapper around the DB-API actual, scheduling blocking calls on threads.asyncio
largely obsoletes this, by instead inlining the IO into the current event loop.While this might look like a performance hit on first glance, i think that it might actually become a slight improvement, as asyncio would be able to cooperatively advance execution on every single connection at once every loop, and an alternate IO-optimised loop (such as
uvloop
) could bring even further improvements.psycopg3
provides an asyncio interface, which will largely mirror its existing interface, only withasync def
in front of most operations.For sqlite, aiosqlite exists, scheduling all calls onto a single thread, allowing the event loop to continue while it churns away.
Integrated Testing
This is the bit i'm not entirely sure sure about yet, excluding the costs of rewriting a large amount of twisted-dominated test code with a
pytest
-esc equivalent (or the likes), synapse make heavy use ofpump
, basically time-travelling the reactor to get more results quicker.I could not find a good equivalent in asyncio, while both
asynctest
andaiomock
exist, both are not maintained.This is still a somewhat open area of research, though the ecosystem is rich.
Motivation
Of course, all of this is not actionable without any motivation, "why switch from twisted?". The main reasons for staying would be; Stability, Developer Experience, Framework Integration (
twisted.web
).All of these are extremely valid reasons, and I'm not rejecting them, this issue is more a gathering of "what if", or "what would it take" to switch to asyncio.
Though still, I think there are some good reasons to switch to asyncio;
Modern Ecosystem
Around the time of python 3.4,
asyncio
was poised to become the standard-library async framework, while many more existed at the time (Twisted, Tornado, gevent, etc.) concurrently.This all happened around 2016,
asyncio
was the new kid on the block,twisted
was the established framework, I wasn't around in 2016 to observe this, but I'd bet that for synapse, this choice was very obvious back then. (It also had an integrated asynchronous web framework, which i believe wasn't exactly a given back then)Now, the situation has (in some definition) turned on its head, where
asyncio
is the prominent library in the ecosystem, andtwisted
is more and more becoming an exotic framework in comparison.This is fine for synapse, but because of this, it cannot take advantage of modern libraries and speedups, such as
uvloop
(a libuv-based asyncio reactor, see also https://github.com/matrix-org/synapse/issues/10366)And don't forget to mention the large amount of up-to-date and maintained libraries which can rely on asyncio's large userbase (through python), versus the few libraries which still support twisted through some combination of factors. I don't have exact details, but i believe quite a fair bit of helper libraries have been rejected because they do not have twisted support, only provide asyncio support, and/or are blocking. This could change that.
Future-proofing
Synapse is relatively stable, its now at a stage where it provides a good matrix experience to a sizable portion of the ecosystem.
However, with servers like dendrite and conduit each aiming to take a slice of synapse's share, this may change, and with its relative slowness to compiled code, synapse may become somewhat of a legacy product for matrix.
Twisted does not help with this, as it increasingly becomes alien to many new python developers, who have only known
asyncio
. This may also introduce bugs, as community contributors could rely on asyncio-like behaviour that twisted does not provide, which adds maintenance overhead.It also doesn't attract community developers to really work with synapse, as most of its codebase is intricately intertwined with twisted's framework, which can be a daunting task to learn, work with, interact with, and then subsequently submit patches for. For asyncio, they could possibly already have pre-existing knowledge about its behaviour, and certain libraries.
Ultimately, I think that asyncio will outlast twisted, due to it being co-developed with python, and twisted being a very old library with many alternatives.
Unseen benefits
Furthermore, integrating with asyncio might bring unseen benefits.
The first one that i know of is speed,
asyncio
has some "C-accelerated" parts (such asFuture
,Task
, and some other primitives.) which could already provide a speed boost compared to twisted.This can be extended with
uvloop
, which (as described above) provides alibuv
-based event loop optimised for heavy IO handling.Secondly, with
pyo3-asyncio
, it is possible to write extensions in Rust that accelerate synapse considerably, and are exposed to python as normal awaitables. These would run on a separate rust event loop thread, providing actual access to multithreading. (Whereas the GIL prohibits a lot of concurrent execution, even with DB adapters often enough).Acknowledging the maintenance cost
Of course, this all isn't as simple as the technical reasoning above makes it sound; there is a huge up-front investment and maintenance cost associated with this, first with the rewriting effort itself, then with adjusting to the new framework(s), and lastly maintainer retraining/teaching/learning, etc.
Only in the long run do the results really pay off, where synapse can become simpler to maintain and easier to be contributed to by community members, and potentially find new maintainers for.
The question (of course) for matrix-org (or element, who currently are synapse's maintainers) is if this is worth it.
Some reasoning for this could be visions for the future of synapse, a personal one (and one that I've had since i saw the project) would be that synapse is a testbed for MSCs and other experiments, as python trades strong type guarantees and speed for developer happiness.
As Synapse is today, with it being a rather centre piece of "Matrix", with it still being the largest server, forgoing stability for flexibility and future-proofing like this wouldn't make much sense, but this situation/cost-benefit calculation might (or could) change, and that's what this issue is for.