Closed simonw closed 4 years ago
I've been agonizing over how to make this change in a responsible way for days now. The question is, do I change the root query block to work this way instead?
So today a query like this:
{
repos {
nodes {
id
full_name
description
}
}
}
Becomes this instead:
{
github {
repos {
nodes {
id
full_name
description
}
}
}
}
Making this change would break all existing datasette-graphql
deployments. There aren't many of them yet so it's not quite too late to do that, but it's getting close.
If I want to preserve backwards compatibility there are a few ways I could do it:
datasette-graphql
instance has just one DB, keep the current format (expose all tables on the root). That gets weird if you attach a second DB and suddenly your root format changes._2
suffix to them or don't include them and tell people they should access via the database root instead.If I was designing this plugin from scratch today, what would I do here?
I think I'd do database names as the root nodes. I'm slightly unhappy with the extra verbosity considering that the vast majority of Datasette instances only attach a single database.
That verbosity argument is convincing to me: I don't like forcing every user of this plugin to forever include the name of their root database (which might be something boring like data.db
) on every query they ever make:
{
data {
repos {
nodes {
id
full_name
description
}
}
}
}
So I think I should optimize for the single-database case - which the plugin supports today - but provide the ability to query additional databases too only if more than one database is connected.
So a plan for that could be that all tables from the first connected database are exposed on the root. If you have other connected databases they become available as de-duplicated root nodes (so as not to collide with any table names) - maybe even with a _db
suffix. A plugin configuration setting can turn this feature on or off.
After prototyping this for a bit, I'm not sure this feature is worthwhile.
I'm getting errors like this:
File "/Users/simon/.local/share/virtualenvs/datasette-graphql-n1OSJCS8/lib/python3.8/site-packages/graphene/types/typemap.py", line 97, in graphene_reducer
assert _type.graphene_type == type, (
AssertionError: Found different types with the same name in the schema: usersCollection, usersCollection.
So I'll need to de-dupe the names of individual field types across the entire schema.
Much more concerning: I'm seeing 15s load times on building the schema once I attach a few different database files! This 15s delay affects everything, since the GraphQL code introspects the schema on every HTTP request.
I could fix this by caching the generated schema but then I'll need to. figure out what to do when the underlying database schema changes to invalidate the cache.
Here's where I got to with the prototype:
diff --git a/datasette_graphql/__init__.py b/datasette_graphql/__init__.py
index bef59a8..574b6a7 100644
--- a/datasette_graphql/__init__.py
+++ b/datasette_graphql/__init__.py
@@ -1,11 +1,12 @@
from click import ClickException
from datasette import hookimpl
from datasette.utils.asgi import Response, NotFound
+import graphene
from graphql.execution.executors.asyncio import AsyncioExecutor
from graphql.error import format_error
from graphql import graphql, print_schema
import json
-from .utils import schema_for_database
+from .utils import query_for_database
async def post_body(request):
@@ -26,7 +27,13 @@ async def view_graphql_schema(request, datasette):
datasette.get_database(database)
except KeyError:
raise NotFound("Database does not exist")
- schema = await schema_for_database(datasette, database=database)
+ db_query = await query_for_database(datasette, database=database)
+ schema = graphene.Schema(
+ query=db_query,
+ auto_camelcase=(datasette.plugin_config("datasette-graphql") or {}).get(
+ "auto_camelcase", False
+ ),
+ )
return Response.text(print_schema(schema))
@@ -54,10 +61,27 @@ async def view_graphql(request, datasette):
else {},
)
- schema = await schema_for_database(datasette, database=database)
+ # Complex logic here for merging different databases
+ db_query = await query_for_database(datasette, database=database)
+
+ db_queries = {}
+ for name in datasette.databases:
+ db_queries[name] = await query_for_database(datasette, database=name)
- if request.args.get("schema"):
- return Response.text(print_schema(schema))
+ root_query_properties = {
+ "{}_db".format(db_name): graphene.Field(
+ db_query, resolver=lambda parent, info: {}
+ )
+ for db_name, db_query in db_queries.items()
+ }
+ RootQuery = type("RootQuery", (graphene.ObjectType,), root_query_properties)
+
+ schema = graphene.Schema(
+ query=RootQuery,
+ auto_camelcase=(datasette.plugin_config("datasette-graphql") or {}).get(
+ "auto_camelcase", False
+ ),
+ )
incoming = {}
if body:
diff --git a/datasette_graphql/utils.py b/datasette_graphql/utils.py
index ae7e96c..d85de2c 100644
--- a/datasette_graphql/utils.py
+++ b/datasette_graphql/utils.py
@@ -57,7 +57,7 @@ class PageInfo(graphene.ObjectType):
hasNextPage = graphene.Boolean()
-async def schema_for_database(datasette, database=None, tables=None):
+async def query_for_database(datasette, database=None, tables=None):
db = datasette.get_database(database)
hidden_tables = await db.hidden_table_names()
@@ -74,7 +74,8 @@ async def schema_for_database(datasette, database=None, tables=None):
for table, meta in table_metadata.items():
column_names = meta.graphql_columns.values()
- options = list(zip(column_names, column_names))
+ column_names_for_enums = [c for c in column_names if not _is_sunder(c)]
+ options = list(zip(column_names_for_enums, column_names_for_enums))
sort_enum = graphene.Enum.from_enum(
Enum("{}Sort".format(meta.graphql_name), options)
)
@@ -190,14 +191,11 @@ async def schema_for_database(datasette, database=None, tables=None):
)
Query = type(
- "Query", (graphene.ObjectType,), {key: value for key, value in to_add},
- )
- return graphene.Schema(
- query=Query,
- auto_camelcase=(datasette.plugin_config("datasette-graphql") or {}).get(
- "auto_camelcase", False
- ),
+ "{}Query".format(db.name),
+ (graphene.ObjectType,),
+ {key: value for key, value in to_add},
)
+ return Query
def make_table_collection_class(table, table_class, meta):
@@ -576,3 +574,13 @@ class Namer:
suffix += 1
self.names.add(value)
return value
+
+
+def _is_sunder(name):
+ """Returns True if a _sunder_ name, False otherwise."""
+ return (
+ name[0] == name[-1] == "_"
+ and name[1:2] != "_"
+ and name[-2:-1] != "_"
+ and len(name) > 2
+ )
diff --git a/tests/test_schema_for_database.py b/tests/test_schema_for_database.py
index 0a0c571..9d2b13e 100644
--- a/tests/test_schema_for_database.py
+++ b/tests/test_schema_for_database.py
@@ -1,5 +1,5 @@
from datasette.app import Datasette
-from datasette_graphql.utils import schema_for_database
+from datasette_graphql.utils import query_for_database
from graphql.execution.executors.asyncio import AsyncioExecutor
from graphql import graphql
import pytest
@@ -8,7 +8,7 @@ from .fixtures import ds, db_path
@pytest.mark.asyncio
async def test_schema(ds):
- schema = await schema_for_database(ds)
+ schema = await query_for_database(ds)
query = """{
users {
I'm not going to implement this unless someone asks for it. I think it's too much extra implementation complexity and will encourage GraphQL schemas that are far too big.
If you have multiple databases connected,
/graphql
could expose a field for each one.