Closed simonw closed 5 years ago
I think I can do this almost entirely within my existing BaseView class structure.
First, decouple the async data() methods by teaching them to take a querystring object as an argument instead of a Sanic request object. The get() method can then send that new object instead of a request.
Next teach the base class how to obey the ASGI protocol.
I should be able to get support for both Sanic and uvicorn/daphne working in the same codebase, which will make it easy to compare their performance.
Thinking about this further, maybe I should embrace ASGI turtles-all-the-way-down and teach each datasette view class to take a scope to the constructor and act entirely as an ASGI component. Would be a nice way of diving deep into ASGI and I can add utility helpers for things like querystring evaluation as I need them.
I built this ASGI debugging tool to help with this migration: https://asgi-scope.now.sh/fivethirtyeight-34d6604/most-common-name%2Fsurnames.json?foo=bar&bazoeuto=onetuh&a=.
Results of an extremely simple micro-benchmark comparing the two shows that uvicorn is at least as fast as Sanic (benchmarks a little faster with a very simple payload): https://gist.github.com/simonw/418950af178c01c416363cc057420851
https://github.com/encode/uvicorn/blob/572b5fe6c811b63298d5350a06b664839624c860/uvicorn/run.py#L63 is how you start a Uvicorn server from code as opposed to the uvicorn
CLI
from uvicorn.run import UvicornServer
UvicornServer().run(app, host=host, port=port)
This looks VERY relevant: https://github.com/encode/starlette
I’m up for helping with this.
Looks like you’d need static files support, which I’m planning on adding a component for. Anything else obviously missing?
For a quick overview it looks very doable - the test client ought to me your test cases stay roughly the same.
Are you using any middleware or other components for the Sanic ecosystem? Do you use cookies or sessions at all?
No cookies or sessions - no POST requests in fact, Datasette just cares about GET (path and querystring) and being able to return custom HTTP headers.
Okay. I reckon the latest version should have all the kinds of components you'd need:
Recently added ASGI components for Routing and Static Files support, as well as making few tweaks to make sure requests and responses are instantiated efficiently.
Don't have any redirect-to-slash / redirect-to-non-slash stuff out of the box yet, which it looks like you might miss.
I'm now hacking around with an initial version of this in the starlette branch.
Here's my work in progress, deployed using datasette publish now fixtures.db -n datasette-starlette-demo --branch=starlette --extra-options="--asgi"
https://datasette-starlette-demo.now.sh/
Lots more work to do - the CSS isn't being served correctly for example, it's showing this error when I hit /-/static/app.css
:
INFO: 127.0.0.1 - "GET /-/static/app.css HTTP/1.1" 200
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/Users/simonw/Dropbox/Development/datasette/venv/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py", line 363, in run_asgi
result = await asgi(self.receive, self.send)
File "/Users/simonw/Dropbox/Development/datasette/venv/lib/python3.6/site-packages/starlette/staticfiles.py", line 91, in __call__
await response(receive, send)
File "/Users/simonw/Dropbox/Development/datasette/venv/lib/python3.6/site-packages/starlette/response.py", line 180, in __call__
{"type": "http.response.body", "body": chunk, "more_body": False}
File "/Users/simonw/Dropbox/Development/datasette/venv/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py", line 483, in send
raise RuntimeError("Response content shorter than Content-Length")
RuntimeError: Response content shorter than Content-Length
It looks like that's a bug in Starlette - filed here: https://github.com/encode/starlette/issues/32
Tom shipped my fix for that bug already, so https://datasette-starlette-demo.now.sh/ is now serving CSS!
Some notes:
.env
file, but I don't see ASGI itself having a specific interface there.Wow, this issue has been open for a full year now!
I've been thinking about this a lot. I've decided I want Datasette to use ASGI 3.0 internally with no dependencies on anything else - then I want the option to run Datasette under both daphne and uvicorn - because uvicorn doesn't support Python 3.5 but Datasette still needs to (primarily for Glitch), and daphne works with 3.5.
So I'm going to try to go the following route:
pip install datasette
you get Daphne as a dependency (I'd like you to be able to opt-out of installing Daphne, I'm not yet sure how that would work)asgi_serve
plugin hook allows a plugin to serve Datasette using uvicorn (or hypercorn) insteadI said earlier that I only need to support GET - I actually need to be able to support POST too, mainly to support plugins (e.g. a plugin that allows authenticated login before you can view Datasette, but potentially also plugins that let you write data directly to SQLite as well).
While I'm not depending on Starlette any more I will need to instead depend on https://github.com/andrew-d/python-multipart for POST form parsing - as used by Starlette here https://github.com/encode/starlette/blob/ab86530eddfcf56e0f7e5ca56f6ab69c15594a7d/starlette/requests.py#L178-L193
Bah, I'd much rather depend on Starlette for things like form parsing - but it's 3.6+ only!
https://github.com/encode/starlette/blob/ab86530eddfcf56e0f7e5ca56f6ab69c15594a7d/setup.py#L39
Maybe I could require Python 3.6 or higher if you want to handle POST data? This would make my internals far too complicated though I think.
https://github.com/simonw/datasette/commit/9fdb47ca952b93b7b60adddb965ea6642b1ff523 added decode_path_component()
and encode_path_component()
functions because ASGI decodes %2F encoded slashes in URLs automatically. The new encoding scheme looks like this:
"table/and/slashes" => "tableU+002FandU+002Fslashes"
"~table" => "U+007Etable"
"+bobcats!" => "U+002Bbobcats!"
"U+007Etable" => "UU+002B007Etable"
For background see this comment: https://github.com/django/asgiref/issues/51#issuecomment-450603464
Useful context stuff:
ASGI decodes %2F encoded slashes in URLs automatically
raw_path
for ASGI looks to be under consideration: https://github.com/django/asgiref/issues/87
uvicorn doesn't support Python 3.5
That was an issue specifically against the <=3.5.2 minor point releases of Python, now resolved: https://github.com/encode/uvicorn/issues/330 👍
Starlette for things like form parsing - but it's 3.6+ only!
Yeah - the bits that require 3.6 are anywhere with the "async for" syntax. If it wasn't for that I'd downport it, but that one's a pain. It's the one bit of syntax to watch out for if you're looking to bring any bits of implementation across to Datasette.
OK, time for a solid implementation plan.
As soon as https://github.com/django/asgiref/pull/92 is merged (hopefully very soon) the ASGI spec will have support for an optional raw_path
- which means we can continue to use table%2Fnames
with embedded /
without being unable to tell if a path has been decoded or not.
Steps to implement:
Add a .asgi(self, scope, receive, send)
method to my base view class. This will expose an ASGI interface to the outside world: the method itself will construct a request object and call the existing .get()
method.
My only true shared base class is actually RenderMixin
because the IndexView
doesn't extend BaseView
. I'm going to refactor the class hierarchy a bit here - AsgiView
will be my top level class with the .asgi()
method on it. RenderMixin
will be renamed BaseView(AsgiView)
, while existing BaseView
will be renamed DataView(BaseView)
since it mainly exists to introduce the handy .data()
abstraction.
So...
AsgiView
- has .asgi()
method, extends Sanic HTTPMethodView
(for the moment)BaseView(AsgiView)
- defines utility methods currently on RenderMixin
IndexView(BaseView)
- the current IndexView
DataView(BaseView)
- defines the utilities currently on BaseView
, including data()
DataView
DatasetteView
I considered calling this RouteView
, but one of the goals of this project is to allow other ASGI apps to import Datasette itself and reuse it as its own ASGI function.
So DatasetteView
will subclass BaseView
and will do all of the routing logic. That logic currently lives here:
Almost all of the unit tests currently use app_client.get("/path...")
. I want to be able to run tests against both ASGI and existing-Sanic, so for the moment I'm going to teach app_client.get()
to use ASGI instead but only in the presence of a new environment variable. I can then have Travis run the tests twice - once with that environement variable and once without.
Uvicorn supports Python 3.5 again as of https://github.com/encode/uvicorn/issues/330 - so it's going to be the new dependency for Datasette.
Just some sanity checking to make sure there aren't any weird issues.
Hopefully this will just involve changes being made to the AsgiView base class, since that subclasses Sanic HTTPMethodView
.
Bonus: It looks like dropping Sanic as a dependency in favour of Uvicorn should give us Windows support! https://github.com/encode/uvicorn/issues/82
I'll probably revert 9fdb47ca952b93b7b60adddb965ea6642b1ff523 from https://github.com/simonw/datasette/issues/272#issuecomment-494192779 since I won't need it now that ASGI is getting raw_path
support.
For reference, here's some WIP code I wrote last year against the old ASGI 2 spec: https://github.com/simonw/datasette/commit/4fd36ba2f3f91da7258859808616078e3464fb97
For the routing component: I'm going to base my implementation on the one from Django Channels.
Documented here: https://channels.readthedocs.io/en/latest/topics/routing.html#urlrouter
Particularly relevant: my view classes need access to the components that were already parsed out of the URL by the router. I'm going to copy the Django Channels mechanism of stashing those in scope["url_route"]["kwargs"]
.
Started sketching out the router in the asgi
branch: https://github.com/simonw/datasette/commit/7cdc55c6836fe246b1ca8a13a965a39991c9ffec
I have an open pull request to Uvicorn with an implementation of raw_path
: https://github.com/encode/uvicorn/pull/372
How should file serving work?
Starlette and Sanic both use aiofiles
- https://github.com/Tinche/aiofiles - which is a small wrapper around file operations which runs them all in an executor thread. It doesn't have any C dependencies so it looks like a good option. Quart uses it too.
aiohttp
does things differently: it has an implementation based on sendfile with an alternative fallback which reads chunks from a file object and yields them one chunk at a time,
Uvicorn 0.8.1 is our and supports raw_path
!
I need to be able to define the URL routes once and have them work for both Sanic and ASGI.
I'm going to extract the web application bits out of the Datasette
class into a DatasetteServer
class. Then I can have a add_route()
method on that class, then have DatasetteSanic
and DatasetteAsgi
subclasses which redefine that method.
Getting this to work with both Sanic AND ASGI at the same time (via the classes described previously with an --asgi
command-line option) is proving way trickier than I expected, mainly because of the complexity of the current Datasette.app() method.
I'm going to drop the compatibility path for a bit and see if I can make progress on a pure-ASGI port.
Lots still to do:
I'm going to work on getting the unit test framework to be ASGI-compatible next.
Published an in-progress demo:
datasette publish now fixtures.db -n datasette-asgi-early-demo --branch=asgi
Here it is: https://datasette-asgi-early-demo-qahhxctqpw.now.sh/
OK, it's beginning to shape up now. Next steps:
CSV tests all pass as of https://github.com/simonw/datasette/commit/ff9efa668ebc33f17ef9b30139960e29906a18fb
This code could be a lot neater though. At the very least I'm going to refactor datasette/utils.py
into a datasette/utils
package and put all of my new ASGI utilities in datasette/utils/asgi.py
The way I implemented streaming on top of a writer object (inspired by Sanic) is a bit of a weird hack. I think I'd rather use an abstraction where my view functions can yield chunks of body data.
Next test to fix (because by new test harness doesn't actually obey the allow_redirects=
parameter):
_____________ test_database_page_redirects_with_url_hash _____________
app_client_with_hash = <tests.fixtures.TestClient object at 0x10981f240>
def test_database_page_redirects_with_url_hash(app_client_with_hash):
response = app_client_with_hash.get("/fixtures", allow_redirects=False)
assert response.status == 302
response = app_client_with_hash.get("/fixtures")
> assert "fixtures" in response.text
E AssertionError: assert 'fixtures' in ''
E + where '' = <tests.fixtures.TestResponse object at 0x10981f550>.text
All of the tests are now passing!
I still need a solution for this:
I think the answer is ASGI lifespan, which is supported by Uvicorn. https://asgi.readthedocs.io/en/latest/specs/lifespan.html#startup
I also need to actually take advantage of raw_path
such that pages like https://fivethirtyeight.datasettes.com/fivethirtyeight/twitter-ratio%2Fsenators can be correctly served.
Tests are failing on Python 3.5: https://travis-ci.org/simonw/datasette/jobs/549380098 - error is TypeError: the JSON object must be str, not 'bytes'
And now the tests are all passing!
Still to do:
raw_path
so table names containing /
can work correctly@app.listener("before_server_start")
I'm going to move the remaining work into a pull request.
It's alive! Here's the first deployed version: https://a559123.datasette.io/
You can confirm it's running under ASGI by viewing https://a559123.datasette.io/-/versions and looking for the "asgi"
key.
Compare to the last version of master running on Sanic here: http://aa91112.datasette.io/
I wrote about this on my blog: https://simonwillison.net/2019/Jun/23/datasette-asgi/
Datasette doesn't take much advantage of Sanic, and I'm increasingly having to work around parts of it because of idiosyncrasies that are specific to Datasette - caring about the exact order of querystring arguments for example.
Since Datasette is GET-only our needs from a web framework are actually pretty slim.
This becomes more important as I expand the plugins #14 framework. Am I sure I want the plugin ecosystem to depend on a Sanic if I might move away from it in the future?
If Datasette wasn't all about async/await I would use WSGI, but today it makes more sense to use ASGI. I'd like to be confident that switching to ASGI would still give me the excellent performance that Sanic provides.
https://github.com/django/asgiref/blob/master/specs/asgi.rst