Open eyeseast opened 3 years ago
Datasette has its own implementation of a write queue for exactly this purpose - and there's no reason at all that should stay in Datasette rather than being extracted out and moved over here to sqlite-utils
.
One small concern I have is around the API design. I'd want to keep supporting the existing synchronous API while also providing a similar API with await-based methods.
What are some good examples of libraries that do this? I like how https://www.python-httpx.org/ handles it, maybe that's a good example to imitate?
Another option here would be to add https://github.com/omnilib/aiosqlite/blob/main/aiosqlite/core.py as a dependency - it's four years old now and actively marinated, and the code is pretty small so it looks like a solid, stable, reliable dependency.
I think HTTPX gets it exactly right, with a clear separation between sync and async clients, each with a basically identical API. (I'm about to switch feed-to-sqlite over to it, from Requests, to eventually make way for async support.)
I have a hunch this is actually going to be quite difficult, due to the internal complexity of some of the sqlite-utils
API methods.
Consider db[table].extract(...)
for example. It does a whole bunch of extra queries inside the method - each of those would need to be turned into an await
call for the async version. Here's the method body today:
Writing this method twice - looking similar but with await ...
tucked in before every internal method it calls that needs to execute SQL - is going to be pretty messy.
One thing that would help a LOT is figuring out how to share the majority of the test code. If the exact same tests could run against both the sync and async versions with a bit of test trickery, maintaining parallel implementations would at least be a bit more feasible.
Here's a really wild idea: I wonder if it would be possible to run a source transformation against either the sync or the async versions of the code to produce the equivalent for the other paradigm?
Could that even be as simple as a set of regular expressions against the await ...
version that strips out or replaces the await
and async def
and async for
statements?
If so... I could maintain just the async version, generate the sync version with a script and rely on robust unit testing to guarantee that this actually works.
I had this resource by Seth Michael Larson saved https://github.com/sethmlarson/pycon-async-sync-poster I haven't had a look at it, but it may contain useful info.
On twitter, I mentioned passing an aiosqlite connection during the Database
creation. I'm not 100% familiar with the sqlite-utils
codebase, so I may be wrong here, but maybe decorating internal functions could be an option? Then they are awaited or not inside the decorator depending on how they are called.
"how to use some new tools to more easily maintain a codebase that supports both async and synchronous I/O and multiple async libraries" - yeah that's exactly what I need, thank you!
unasync
is an implementation of the exact pattern I was talking about above - it uses the tokenize
module from the Python standard library to apply some clever rules to transform an async codebase into a sync one. https://unasync.readthedocs.io/en/latest/ - implementation here: https://github.com/python-trio/unasync/blob/v0.5.0/src/unasync/__init__.py
From the poster here: https://github.com/sethmlarson/pycon-async-sync-poster/blob/master/poster.pdf
Some other interesting background reading: https://docs.sqlalchemy.org/en/14/orm/extensions/asyncio.html - in particular see how SQLALchemy has a await conn.run_sync(meta.drop_all)
mechanism for running methods that haven't themselves been provided in an async version
Hah or use this trick, which genuinely rewrites the code at runtime using a class decorator! https://github.com/python-happybase/aiohappybase/blob/0990ef45cfdb720dc987afdb4957a0fac591cb99/aiohappybase/sync/_util.py#L19-L32
Microsoft's playwright Python library solves this problem by code generating both their sync AND their async libraries https://github.com/microsoft/playwright-python/tree/master/scripts
I've been thinking about this a bit lately, doing a project that involves moving a lot of data in and out of SQLite files, datasette and GeoJSON. This has me leaning toward the idea that something like datasette query
would be a better place to do async queries.
I know there's a lot of overlap in sqlite-utils and datasette, and maybe keeping sqlite-utils synchronous would let datasette be entirely async and give a cleaner separation of implementations.
Following our conversation last week, want to note this here before I forget.
I've had a couple situations where I'd like to do a bunch of updates in an async event loop, but I run into SQLite's issues with concurrent writes. This feels like something sqlite-utils could help with.
PeeWee ORM has a SQLite write queue that might be a good model. It's using threads or gevent, but I think that approach would translate well enough to asyncio.
Happy to help with this, too.