Closed simonw closed 4 years ago
Could be as simple as response = await datasette.get("/path/blah")
- which could also be re-used by the implementation of the datasette --get /
CLI option introduced in #927.
Bit weird calling it .get()
since that clashes with Python's dictionary .get()
method.
Should it default to treating things as if they had the .json
extension? There are use-cases for the non-JSON method, such as https://github.com/natbat/tidepools_near_me/commit/ec102c6da5a5d86f17628740d90b6365b671b5e1
I think I'm OK with people having to add .json
to their internal calls. Maybe they could use format="json"
) as an optional parameter which would automatically handle the very weird edge-cases where you need to use ?_format=json
instead of .json
(due to table names existing with a .json
suffix).
Alternative name possibilities:
datasette.http_get(...)
- slightly misleading since it's not going over the HTTP protocoldatasette.internal_get(...)
- the internal_
might suggest its not an API for external use, which isn't true - it's for pluginsdatasette.get(...)
- clashes with dict.get()
but I'm not at all sure that's a good reason not to use itActually no - requests.get()
and httpx.get()
prove that having a .get()
method for an HTTP-related API isn't confusing to people at all.
datasette.get()
it is.
(I'll probably add datasette.post()
in the future too).
Should internal requests executed in this way be handled by plugins that used the asgi_wrapper()
hook?
Hard to be sure one way or the other. I'm worried about logging middleware triggering twice - but actually anyone doing serious logging of their Datasette instance is probably doing it in a different layer (uvicorn logs or nginx proxy or whatever) so they wouldn't be affected. There aren't any ASGI logging middlewares out there that I've seen.
Also: if you run into a situation where your stuff is breaking because datasette.get()
is calling ASGI middleware twice you can fix it by running your ASGI middleware outside of the asgi_wrapper
plugin hook mechanism.
So I think it DOES execute asgi_wrapper()
middleware.
What about authentication checks etc? Won't they run twice?
I think that's OK too, in fact it's desirable: think of the case of datasette-graphql
where a bunch of different TableView calls are being made as part of the same GraphQL queries. Having those calls take advantage of finely grained per-table authentication and permission checks seems like a good feature.
Right now calling datasette.app()
instantiates an ASGI application - complete with a bunch of routes and wrappers - and returns that application object. Calling it twice instantiates another ASGI application.
I think a single Datasette
instance should only ever create a single ASGI app - so the .app()
method should cache the ASGI app that it returns the first time and return the same application again on future calls.
One thing to consider here: Datasette's table and database name escaping rules can be a little bit convoluted.
If a plugin wants to get back the first five rows of a table, it will need to construct a URL /dbname/tablename?_size=5
- but it will need to know how to turn the database and table names into the correctly escaped dbname
and tablename
values.
Here's how the row.html
table handles that right now: https://github.com/simonw/datasette/blob/b21ed237ab940768574c834aa5a7130724bd3a2d/datasette/templates/row.html#L19-L23
It would be an improvement to have this logic abstracted out somewhere and documented so plugins can use it.
Maybe allow this:
response = await datasette.get("/{database}/{table}.json", database=database, table=table)
This could cause problems if users ever need to pass literal {
in their paths. Maybe allow this too:
response = await datasette.get("/{database}/{table}.json", interpolate=False)
Not convinced this is useful - it's a bit unintuitive.
I just realised that this mechanism is kind of like being able to use microservices - make API calls within your application - except that everything runs in the same process against SQLite databases so calls will be lightning fast.
It also means that a plugin can add a new internal API to Datasette that's accessible to other plugins by registering a new route with register_routes
!
Also fun: the inevitable plugin that exposes this to the template language - so Datasette templates can stitch together data from multiple other internal API calls. Fun way to take advantage of async
support in Jinja.
Need to decide what to do about JSON responses.
When called from a template it's likely the intent will be to further loop through the JSON data returned. It would be annoying to have to run json.loads
here.
Maybe a .get_json()
method then? Or even return a response that has .json()
and .text
similar to httpx
- or just return an httpx
response.
I'm leaning towards defaulting to JSON as the requested format - you can pass format="html"
if you want HTML.
But weird that it's different from the web UI.
Maybe .get
vs .get_html
?
I'm not going to mess around with formats - you'll get back the exact response that a web client would receive.
Question: what should the response object look like? e.g. if you do:
response = await datasette.get("/db/table.json")
What should response
be?
I could reuse the Datasette Response
class from datasette.utils.asgi
. This would work well for regular responses which just have a status code, some headers and a response body. It wouldn't be great for streaming responses though such as you get back from ?_stream=1
CSV exports.
So what should I do about streaming responses?
I could deliberately ignore them - through an exception if you attempt to run await datasette.get(...)
against a streaming URL.
I could load the entire response into memory and return it as a wrapped object.
I could support some kind of asynchronous iterator mechanism. This would be pretty elegant if I could decide the right syntax for it - it would allow plugins to take advantage of other internal URLs that return streaming content without needing to load that content entirely into memory in order to process it.
Maybe these methods become the way most Datasette tests are written, replacing the existing TestClient
mechanism?
I'm tempted to create a await datasette.request()
method which can take any HTTP verb - then have datasette.get()
and datasette.post()
as thin wrappers around it.
What if datasette.get()
was an alias for httpx.get()
, pre-configured to route to the correct application? And with some sugar that added http://localhost/
to the beginning of the path if it was missing?
This would make httpx
a dependency of core Datasette, which I think is OK.
It would also solve the return type problem: I would return whatever httpx
returns.
I could solve streaming using something like this:
async with datasette.stream("GET", "/fixtures/compound_three_primary_keys.csv?_stream=on&_size=max") as response:
async for chunk in response.aiter_bytes():
print(chunk)
Which would be a wrapper around AsyncClient.stream(method, url, ...)
from https://www.python-httpx.org/async/#streaming-responses
I think I can use async with httpx.AsyncClient(base_url="http://localhost/") as client:
to ensure I don't need to use http://localhost/
on every call.
Maybe instead of implementing datasette.get()
and datasette.post()
and datasette.request()
and datasette.stream()
I could instead have a nested object called datasette.client
which is a preconfigured AsyncClient
instance.
response = await datasette.client.get("/")
Or perhaps this should be a method in case I ever need to be able to await
it:
response = await (await datasette.client()).get("/")
This is a bit cosmetically ugly though, I'd rather avoid that if possible.
Maybe I could get this working by returning an object from .client()
which provides a await obj.get()
method:
response = await datasette.client().get("/")
I don't think there's any benefit to that over await datasette.client.get()
though.
Should I instantiate a single Client
and reuse it for all internal requests, or can I instantiate a new Client
for each request?
https://www.python-httpx.org/advanced/#why-use-a-client says that the main benefit of a Client instance is HTTP connection pooling - which isn't an issue for these internal requests since they won't be using the HTTP protocol at all, they'll be calling the ASGI application directly.
So I'm leaning towards instantiating a fresh client for every internal request. I'll run a microbenchmark to check that this doesn't have any unpleasant performance implications.
dogsheep-beta could do with this too. It currently makes a call to TableView
in a similar way to datasette-graphql
in order to calculate facets.
dogsheep-beta
would benefit with a mechanism for changing the facet timeout setting during that call (as would datasette-graphql
, see the DatasetteSpecialConfig mechanism it uses).
I put together a minimal prototype of this and it feels pretty good:
diff --git a/datasette/app.py b/datasette/app.py
index 20aae7d..fb3bdad 100644
--- a/datasette/app.py
+++ b/datasette/app.py
@@ -4,6 +4,7 @@ import collections
import datetime
import glob
import hashlib
+import httpx
import inspect
import itertools
from itsdangerous import BadSignature
@@ -312,6 +313,7 @@ class Datasette:
self._register_renderers()
self._permission_checks = collections.deque(maxlen=200)
self._root_token = secrets.token_hex(32)
+ self.client = DatasetteClient(self)
async def invoke_startup(self):
for hook in pm.hook.startup(datasette=self):
@@ -1209,3 +1211,25 @@ def route_pattern_from_filepath(filepath):
class NotFoundExplicit(NotFound):
pass
+
+
+class DatasetteClient:
+ def __init__(self, ds):
+ self.app = ds.app()
+
+ def _fix(self, path):
+ if path.startswith("/"):
+ path = "http://localhost{}".format(path)
+ return path
+
+ async def get(self, path, **kwargs):
+ async with httpx.AsyncClient(app=self.app) as client:
+ return await client.get(self._fix(path), **kwargs)
+
+ async def post(self, path, **kwargs):
+ async with httpx.AsyncClient(app=self.app) as client:
+ return await client.post(self._fix(path), **kwargs)
+
+ async def options(self, path, **kwargs):
+ async with httpx.AsyncClient(app=self.app) as client:
+ return await client.options(self._fix(path), **kwargs)
Used like this in ipython
:
In [1]: from datasette.app import Datasette
In [2]: ds = Datasette(["fixtures.db"])
In [3]: (await ds.client.get("/-/config.json")).json()
Out[3]:
{'default_page_size': 100,
'max_returned_rows': 1000,
'num_sql_threads': 3,
'sql_time_limit_ms': 1000,
'default_facet_size': 30,
'facet_time_limit_ms': 200,
'facet_suggest_time_limit_ms': 50,
'hash_urls': False,
'allow_facet': True,
'allow_download': True,
'suggest_facets': True,
'default_cache_ttl': 5,
'default_cache_ttl_hashed': 31536000,
'cache_size_kb': 0,
'allow_csv_stream': True,
'max_csv_mb': 100,
'truncate_cells_html': 2048,
'force_https_urls': False,
'template_debug': False,
'base_url': '/'}
In [4]: (await ds.client.get("/fixtures/facetable.json?_shape=array")).json()
Out[4]:
[{'pk': 1,
'created': '2019-01-14 08:00:00',
'planet_int': 1,
'on_earth': 1,
'state': 'CA',
'city_id': 1,
'neighborhood': 'Mission',
'tags': '["tag1", "tag2"]',
'complex_array': '[{"foo": "bar"}]',
'distinct_some_null': 'one'},
{'pk': 2,
'created': '2019-01-14 08:00:00',
'planet_int': 1,
'on_earth': 1,
'state': 'CA',
'city_id': 1,
'neighborhood': 'Dogpatch',
'tags': '["tag1", "tag3"]',
'complex_array': '[]',
'distinct_some_null': 'two'},
This adds httpx
as a dependency - I think I'm OK with that. I use it for testing in all of my plugins anyway.
How important is it to use httpx.AsyncClient
with a context manager?
https://www.python-httpx.org/async/#opening-and-closing-clients says:
Alternatively, use
await client.aclose()
if you want to close a client explicitly:client = httpx.AsyncClient() ... await client.aclose()
The
.aclose()
method has a comment saying "Close transport and proxies" - I'm not using proxies, so the relevant implementation seems to be a call toawait self._transport.aclose()
in https://github.com/encode/httpx/blob/f932af9172d15a803ad40061a4c2c0cd891645cf/httpx/_client.py#L1741-L1751
The transport I am using is a class called ASGITransport
in https://github.com/encode/httpx/blob/master/httpx/_transports/asgi.py
The aclose()
method on that class does nothing. So it looks like I can instantiate a client without bothering with the async with httpx.AsyncClient
bit.
Even smaller DatasetteClient
implementation:
class DatasetteClient:
def __init__(self, ds):
self._client = httpx.AsyncClient(app=ds.app())
def _fix(self, path):
if path.startswith("/"):
path = "http://localhost{}".format(path)
return path
async def get(self, path, **kwargs):
return await self._client.get(self._fix(path), **kwargs)
async def post(self, path, **kwargs):
return await self._client.post(self._fix(path), **kwargs)
async def options(self, path, **kwargs):
return await self._client.options(self._fix(path), **kwargs)
I may as well implement all of the HTTP methods supported by the httpx
client:
class DatasetteClient:
def __init__(self, ds):
self._client = httpx.AsyncClient(app=ds.app())
def _fix(self, path):
if path.startswith("/"):
path = "http://localhost{}".format(path)
return path
async def get(self, path, **kwargs):
return await self._client.get(self._fix(path), **kwargs)
async def options(self, path, **kwargs):
return await self._client.options(self._fix(path), **kwargs)
async def head(self, path, **kwargs):
return await self._client.head(self._fix(path), **kwargs)
async def post(self, path, **kwargs):
return await self._client.post(self._fix(path), **kwargs)
async def put(self, path, **kwargs):
return await self._client.put(self._fix(path), **kwargs)
async def patch(self, path, **kwargs):
return await self._client.patch(self._fix(path), **kwargs)
async def delete(self, path, **kwargs):
return await self._client.delete(self._fix(path), **kwargs)
Am I going to rewrite ALL of my tests to use this instead? It would clean up a lot of test code, at the cost of quite a bit of work.
It would make for much neater plugin tests too, and neater testing documentation: https://docs.datasette.io/en/stable/testing_plugins.html
I want this in Datasette 0.50, so I can use it in datasette-graphql
and suchlike.
Documentation (from #1006): https://docs.datasette.io/en/latest/internals.html#client
datasette-graphql
works by making internal requests to the TableView class (in order to take advantage of existing pagination logic, plus options like?_search=
and?_where=
) - see #915I want to support a
mod_rewrite
style mechanism for putting nicer URLs on top of Datasette pages - I botched that together for a project here using an internal ASGI proxying trick: https://github.com/natbat/tidepools_near_me/commit/ec102c6da5a5d86f17628740d90b6365b671b5e1If the
datasette
object provided a documented method for executing internal requests (in a way that makes sense with logging etc - i.e. doesn't get logged as a separate request) both of these use-cases would be much neater.