simonw / datasette

An open source multi-tool for exploring and publishing data
https://datasette.io
Apache License 2.0
9.54k stars 687 forks source link

Port Datasette to ASGI #272

Closed simonw closed 5 years ago

simonw commented 6 years ago

Datasette doesn't take much advantage of Sanic, and I'm increasingly having to work around parts of it because of idiosyncrasies that are specific to Datasette - caring about the exact order of querystring arguments for example.

Since Datasette is GET-only our needs from a web framework are actually pretty slim.

This becomes more important as I expand the plugins #14 framework. Am I sure I want the plugin ecosystem to depend on a Sanic if I might move away from it in the future?

If Datasette wasn't all about async/await I would use WSGI, but today it makes more sense to use ASGI. I'd like to be confident that switching to ASGI would still give me the excellent performance that Sanic provides.

https://github.com/django/asgiref/blob/master/specs/asgi.rst

simonw commented 6 years ago

I think I can do this almost entirely within my existing BaseView class structure.

First, decouple the async data() methods by teaching them to take a querystring object as an argument instead of a Sanic request object. The get() method can then send that new object instead of a request.

Next teach the base class how to obey the ASGI protocol.

I should be able to get support for both Sanic and uvicorn/daphne working in the same codebase, which will make it easy to compare their performance.

simonw commented 6 years ago

Thinking about this further, maybe I should embrace ASGI turtles-all-the-way-down and teach each datasette view class to take a scope to the constructor and act entirely as an ASGI component. Would be a nice way of diving deep into ASGI and I can add utility helpers for things like querystring evaluation as I need them.

simonw commented 6 years ago

I built this ASGI debugging tool to help with this migration: https://asgi-scope.now.sh/fivethirtyeight-34d6604/most-common-name%2Fsurnames.json?foo=bar&bazoeuto=onetuh&a=.

simonw commented 6 years ago

Results of an extremely simple micro-benchmark comparing the two shows that uvicorn is at least as fast as Sanic (benchmarks a little faster with a very simple payload): https://gist.github.com/simonw/418950af178c01c416363cc057420851

simonw commented 6 years ago

https://github.com/encode/uvicorn/blob/572b5fe6c811b63298d5350a06b664839624c860/uvicorn/run.py#L63 is how you start a Uvicorn server from code as opposed to the uvicorn CLI

from uvicorn.run import UvicornServer
UvicornServer().run(app, host=host, port=port)
simonw commented 6 years ago

This looks VERY relevant: https://github.com/encode/starlette

tomchristie commented 6 years ago

I’m up for helping with this.

Looks like you’d need static files support, which I’m planning on adding a component for. Anything else obviously missing?

For a quick overview it looks very doable - the test client ought to me your test cases stay roughly the same.

Are you using any middleware or other components for the Sanic ecosystem? Do you use cookies or sessions at all?

simonw commented 6 years ago

No cookies or sessions - no POST requests in fact, Datasette just cares about GET (path and querystring) and being able to return custom HTTP headers.

tomchristie commented 6 years ago

Okay. I reckon the latest version should have all the kinds of components you'd need:

Recently added ASGI components for Routing and Static Files support, as well as making few tweaks to make sure requests and responses are instantiated efficiently.

Don't have any redirect-to-slash / redirect-to-non-slash stuff out of the box yet, which it looks like you might miss.

simonw commented 6 years ago

I'm now hacking around with an initial version of this in the starlette branch.

Here's my work in progress, deployed using datasette publish now fixtures.db -n datasette-starlette-demo --branch=starlette --extra-options="--asgi"

https://datasette-starlette-demo.now.sh/

Lots more work to do - the CSS isn't being served correctly for example, it's showing this error when I hit /-/static/app.css:

INFO: 127.0.0.1 - "GET /-/static/app.css HTTP/1.1" 200
ERROR: Exception in ASGI application
Traceback (most recent call last):
  File "/Users/simonw/Dropbox/Development/datasette/venv/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py", line 363, in run_asgi
    result = await asgi(self.receive, self.send)
  File "/Users/simonw/Dropbox/Development/datasette/venv/lib/python3.6/site-packages/starlette/staticfiles.py", line 91, in __call__
    await response(receive, send)
  File "/Users/simonw/Dropbox/Development/datasette/venv/lib/python3.6/site-packages/starlette/response.py", line 180, in __call__
    {"type": "http.response.body", "body": chunk, "more_body": False}
  File "/Users/simonw/Dropbox/Development/datasette/venv/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py", line 483, in send
    raise RuntimeError("Response content shorter than Content-Length")
RuntimeError: Response content shorter than Content-Length
simonw commented 6 years ago

It looks like that's a bug in Starlette - filed here: https://github.com/encode/starlette/issues/32

simonw commented 6 years ago

Tom shipped my fix for that bug already, so https://datasette-starlette-demo.now.sh/ is now serving CSS!

simonw commented 6 years ago

Refs https://github.com/encode/uvicorn/issues/168

tomchristie commented 6 years ago

Some notes:

simonw commented 5 years ago

Wow, this issue has been open for a full year now!

I've been thinking about this a lot. I've decided I want Datasette to use ASGI 3.0 internally with no dependencies on anything else - then I want the option to run Datasette under both daphne and uvicorn - because uvicorn doesn't support Python 3.5 but Datasette still needs to (primarily for Glitch), and daphne works with 3.5.

So I'm going to try to go the following route:

simonw commented 5 years ago

I said earlier that I only need to support GET - I actually need to be able to support POST too, mainly to support plugins (e.g. a plugin that allows authenticated login before you can view Datasette, but potentially also plugins that let you write data directly to SQLite as well).

simonw commented 5 years ago

While I'm not depending on Starlette any more I will need to instead depend on https://github.com/andrew-d/python-multipart for POST form parsing - as used by Starlette here https://github.com/encode/starlette/blob/ab86530eddfcf56e0f7e5ca56f6ab69c15594a7d/starlette/requests.py#L178-L193

simonw commented 5 years ago

Bah, I'd much rather depend on Starlette for things like form parsing - but it's 3.6+ only!

https://github.com/encode/starlette/blob/ab86530eddfcf56e0f7e5ca56f6ab69c15594a7d/setup.py#L39

Maybe I could require Python 3.6 or higher if you want to handle POST data? This would make my internals far too complicated though I think.

simonw commented 5 years ago

https://github.com/simonw/datasette/commit/9fdb47ca952b93b7b60adddb965ea6642b1ff523 added decode_path_component() and encode_path_component() functions because ASGI decodes %2F encoded slashes in URLs automatically. The new encoding scheme looks like this:

"table/and/slashes" => "tableU+002FandU+002Fslashes"
"~table" => "U+007Etable"
"+bobcats!" => "U+002Bbobcats!"
"U+007Etable" => "UU+002B007Etable"

For background see this comment: https://github.com/django/asgiref/issues/51#issuecomment-450603464

tomchristie commented 5 years ago

Useful context stuff:

ASGI decodes %2F encoded slashes in URLs automatically

raw_path for ASGI looks to be under consideration: https://github.com/django/asgiref/issues/87

uvicorn doesn't support Python 3.5

That was an issue specifically against the <=3.5.2 minor point releases of Python, now resolved: https://github.com/encode/uvicorn/issues/330 👍

Starlette for things like form parsing - but it's 3.6+ only!

Yeah - the bits that require 3.6 are anywhere with the "async for" syntax. If it wasn't for that I'd downport it, but that one's a pain. It's the one bit of syntax to watch out for if you're looking to bring any bits of implementation across to Datasette.

simonw commented 5 years ago

OK, time for a solid implementation plan.

As soon as https://github.com/django/asgiref/pull/92 is merged (hopefully very soon) the ASGI spec will have support for an optional raw_path - which means we can continue to use table%2Fnames with embedded / without being unable to tell if a path has been decoded or not.

Steps to implement:

Refactor classes, then add .asgi() method to BaseView

Add a .asgi(self, scope, receive, send) method to my base view class. This will expose an ASGI interface to the outside world: the method itself will construct a request object and call the existing .get() method.

My only true shared base class is actually RenderMixin because the IndexView doesn't extend BaseView. I'm going to refactor the class hierarchy a bit here - AsgiView will be my top level class with the .asgi() method on it. RenderMixin will be renamed BaseView(AsgiView), while existing BaseView will be renamed DataView(BaseView) since it mainly exists to introduce the handy .data() abstraction.

So...

Extract routing logic out into a new DatasetteView

I considered calling this RouteView, but one of the goals of this project is to allow other ASGI apps to import Datasette itself and reuse it as its own ASGI function.

So DatasetteView will subclass BaseView and will do all of the routing logic. That logic currently lives here:

https://github.com/simonw/datasette/blob/aa911122feab13f8e65875c98edb00fd3832b7b8/datasette/app.py#L594-L640

For tests: Implement a version of app_client.get() that calls ASGI instead

Almost all of the unit tests currently use app_client.get("/path..."). I want to be able to run tests against both ASGI and existing-Sanic, so for the moment I'm going to teach app_client.get() to use ASGI instead but only in the presence of a new environment variable. I can then have Travis run the tests twice - once with that environement variable and once without.

Make datasette serve --asgi run ASGI and uvicorn

Uvicorn supports Python 3.5 again as of https://github.com/encode/uvicorn/issues/330 - so it's going to be the new dependency for Datasette.

Do some comparative testing of ASGI and non-ASGI

Just some sanity checking to make sure there aren't any weird issues.

Final step: refactor out Sanic

Hopefully this will just involve changes being made to the AsgiView base class, since that subclasses Sanic HTTPMethodView.

Bonus: It looks like dropping Sanic as a dependency in favour of Uvicorn should give us Windows support! https://github.com/encode/uvicorn/issues/82

simonw commented 5 years ago

I'll probably revert 9fdb47ca952b93b7b60adddb965ea6642b1ff523 from https://github.com/simonw/datasette/issues/272#issuecomment-494192779 since I won't need it now that ASGI is getting raw_path support.

simonw commented 5 years ago

For reference, here's some WIP code I wrote last year against the old ASGI 2 spec: https://github.com/simonw/datasette/commit/4fd36ba2f3f91da7258859808616078e3464fb97

simonw commented 5 years ago

For the routing component: I'm going to base my implementation on the one from Django Channels.

https://github.com/django/channels/blob/507cb54fcb36df63282dd19653ea743036e7d63c/channels/routing.py#L123-L149

Documented here: https://channels.readthedocs.io/en/latest/topics/routing.html#urlrouter

Particularly relevant: my view classes need access to the components that were already parsed out of the URL by the router. I'm going to copy the Django Channels mechanism of stashing those in scope["url_route"]["kwargs"].

simonw commented 5 years ago

Started sketching out the router in the asgi branch: https://github.com/simonw/datasette/commit/7cdc55c6836fe246b1ca8a13a965a39991c9ffec

simonw commented 5 years ago

I have an open pull request to Uvicorn with an implementation of raw_path: https://github.com/encode/uvicorn/pull/372

simonw commented 5 years ago

How should file serving work?

Starlette and Sanic both use aiofiles - https://github.com/Tinche/aiofiles - which is a small wrapper around file operations which runs them all in an executor thread. It doesn't have any C dependencies so it looks like a good option. Quart uses it too.

aiohttp does things differently: it has an implementation based on sendfile with an alternative fallback which reads chunks from a file object and yields them one chunk at a time,

simonw commented 5 years ago

Uvicorn 0.8.1 is our and supports raw_path!

simonw commented 5 years ago

I need to be able to define the URL routes once and have them work for both Sanic and ASGI.

I'm going to extract the web application bits out of the Datasette class into a DatasetteServer class. Then I can have a add_route() method on that class, then have DatasetteSanic and DatasetteAsgi subclasses which redefine that method.

simonw commented 5 years ago

Getting this to work with both Sanic AND ASGI at the same time (via the classes described previously with an --asgi command-line option) is proving way trickier than I expected, mainly because of the complexity of the current Datasette.app() method.

I'm going to drop the compatibility path for a bit and see if I can make progress on a pure-ASGI port.

simonw commented 5 years ago

Lots still to do:

I'm going to work on getting the unit test framework to be ASGI-compatible next.

simonw commented 5 years ago

Published an in-progress demo:

datasette publish now fixtures.db -n datasette-asgi-early-demo --branch=asgi

Here it is: https://datasette-asgi-early-demo-qahhxctqpw.now.sh/

simonw commented 5 years ago

OK, it's beginning to shape up now. Next steps:

simonw commented 5 years ago

CSV tests all pass as of https://github.com/simonw/datasette/commit/ff9efa668ebc33f17ef9b30139960e29906a18fb

This code could be a lot neater though. At the very least I'm going to refactor datasette/utils.py into a datasette/utils package and put all of my new ASGI utilities in datasette/utils/asgi.py

The way I implemented streaming on top of a writer object (inspired by Sanic) is a bit of a weird hack. I think I'd rather use an abstraction where my view functions can yield chunks of body data.

simonw commented 5 years ago

Next test to fix (because by new test harness doesn't actually obey the allow_redirects= parameter):

_____________ test_database_page_redirects_with_url_hash _____________

app_client_with_hash = <tests.fixtures.TestClient object at 0x10981f240>

    def test_database_page_redirects_with_url_hash(app_client_with_hash):
        response = app_client_with_hash.get("/fixtures", allow_redirects=False)
        assert response.status == 302
        response = app_client_with_hash.get("/fixtures")
>       assert "fixtures" in response.text
E       AssertionError: assert 'fixtures' in ''
E        +  where '' = <tests.fixtures.TestResponse object at 0x10981f550>.text
simonw commented 5 years ago

All of the tests are now passing!

I still need a solution for this:

https://github.com/simonw/datasette/blob/5bd510b01adae3f719e4426b9bfbc346a946ba5c/datasette/app.py#L706-L714

I think the answer is ASGI lifespan, which is supported by Uvicorn. https://asgi.readthedocs.io/en/latest/specs/lifespan.html#startup

simonw commented 5 years ago

I also need to actually take advantage of raw_path such that pages like https://fivethirtyeight.datasettes.com/fivethirtyeight/twitter-ratio%2Fsenators can be correctly served.

simonw commented 5 years ago

Tests are failing on Python 3.5: https://travis-ci.org/simonw/datasette/jobs/549380098 - error is TypeError: the JSON object must be str, not 'bytes'

simonw commented 5 years ago

And now the tests are all passing!

Still to do:

simonw commented 5 years ago

I'm going to move the remaining work into a pull request.

simonw commented 5 years ago

It's alive! Here's the first deployed version: https://a559123.datasette.io/

You can confirm it's running under ASGI by viewing https://a559123.datasette.io/-/versions and looking for the "asgi" key.

Compare to the last version of master running on Sanic here: http://aa91112.datasette.io/

simonw commented 5 years ago

I wrote about this on my blog: https://simonwillison.net/2019/Jun/23/datasette-asgi/