simonw / datasette-graphql

Datasette plugin providing an automatic GraphQL API for your SQLite databases
https://datasette-graphql-demo.datasette.io/
Apache License 2.0
101 stars 6 forks source link

Support querying multiple databases at once #44

Closed simonw closed 4 years ago

simonw commented 4 years ago

If you have multiple databases connected, /graphql could expose a field for each one.

simonw commented 4 years ago

I've been agonizing over how to make this change in a responsible way for days now. The question is, do I change the root query block to work this way instead?

So today a query like this:

{
  repos {
    nodes {
      id
      full_name
      description
    }
  }
}

Becomes this instead:

{
  github {
    repos {
      nodes {
        id
        full_name
        description
      }
    }
  }
}

Making this change would break all existing datasette-graphql deployments. There aren't many of them yet so it's not quite too late to do that, but it's getting close.

If I want to preserve backwards compatibility there are a few ways I could do it:

simonw commented 4 years ago

If I was designing this plugin from scratch today, what would I do here?

I think I'd do database names as the root nodes. I'm slightly unhappy with the extra verbosity considering that the vast majority of Datasette instances only attach a single database.

simonw commented 4 years ago

That verbosity argument is convincing to me: I don't like forcing every user of this plugin to forever include the name of their root database (which might be something boring like data.db) on every query they ever make:

{
  data {
    repos {
      nodes {
        id
        full_name
        description
      }
    }
  }
}

So I think I should optimize for the single-database case - which the plugin supports today - but provide the ability to query additional databases too only if more than one database is connected.

simonw commented 4 years ago

So a plan for that could be that all tables from the first connected database are exposed on the root. If you have other connected databases they become available as de-duplicated root nodes (so as not to collide with any table names) - maybe even with a _db suffix. A plugin configuration setting can turn this feature on or off.

simonw commented 4 years ago

After prototyping this for a bit, I'm not sure this feature is worthwhile.

I'm getting errors like this:

  File "/Users/simon/.local/share/virtualenvs/datasette-graphql-n1OSJCS8/lib/python3.8/site-packages/graphene/types/typemap.py", line 97, in graphene_reducer
    assert _type.graphene_type == type, (
AssertionError: Found different types with the same name in the schema: usersCollection, usersCollection.

So I'll need to de-dupe the names of individual field types across the entire schema.

Much more concerning: I'm seeing 15s load times on building the schema once I attach a few different database files! This 15s delay affects everything, since the GraphQL code introspects the schema on every HTTP request.

I could fix this by caching the generated schema but then I'll need to. figure out what to do when the underlying database schema changes to invalidate the cache.

simonw commented 4 years ago

Here's where I got to with the prototype:

diff --git a/datasette_graphql/__init__.py b/datasette_graphql/__init__.py
index bef59a8..574b6a7 100644
--- a/datasette_graphql/__init__.py
+++ b/datasette_graphql/__init__.py
@@ -1,11 +1,12 @@
 from click import ClickException
 from datasette import hookimpl
 from datasette.utils.asgi import Response, NotFound
+import graphene
 from graphql.execution.executors.asyncio import AsyncioExecutor
 from graphql.error import format_error
 from graphql import graphql, print_schema
 import json
-from .utils import schema_for_database
+from .utils import query_for_database

 async def post_body(request):
@@ -26,7 +27,13 @@ async def view_graphql_schema(request, datasette):
         datasette.get_database(database)
     except KeyError:
         raise NotFound("Database does not exist")
-    schema = await schema_for_database(datasette, database=database)
+    db_query = await query_for_database(datasette, database=database)
+    schema = graphene.Schema(
+        query=db_query,
+        auto_camelcase=(datasette.plugin_config("datasette-graphql") or {}).get(
+            "auto_camelcase", False
+        ),
+    )
     return Response.text(print_schema(schema))

@@ -54,10 +61,27 @@ async def view_graphql(request, datasette):
             else {},
         )

-    schema = await schema_for_database(datasette, database=database)
+    # Complex logic here for merging different databases
+    db_query = await query_for_database(datasette, database=database)
+
+    db_queries = {}
+    for name in datasette.databases:
+        db_queries[name] = await query_for_database(datasette, database=name)

-    if request.args.get("schema"):
-        return Response.text(print_schema(schema))
+    root_query_properties = {
+        "{}_db".format(db_name): graphene.Field(
+            db_query, resolver=lambda parent, info: {}
+        )
+        for db_name, db_query in db_queries.items()
+    }
+    RootQuery = type("RootQuery", (graphene.ObjectType,), root_query_properties)
+
+    schema = graphene.Schema(
+        query=RootQuery,
+        auto_camelcase=(datasette.plugin_config("datasette-graphql") or {}).get(
+            "auto_camelcase", False
+        ),
+    )

     incoming = {}
     if body:
diff --git a/datasette_graphql/utils.py b/datasette_graphql/utils.py
index ae7e96c..d85de2c 100644
--- a/datasette_graphql/utils.py
+++ b/datasette_graphql/utils.py
@@ -57,7 +57,7 @@ class PageInfo(graphene.ObjectType):
     hasNextPage = graphene.Boolean()

-async def schema_for_database(datasette, database=None, tables=None):
+async def query_for_database(datasette, database=None, tables=None):
     db = datasette.get_database(database)
     hidden_tables = await db.hidden_table_names()

@@ -74,7 +74,8 @@ async def schema_for_database(datasette, database=None, tables=None):

     for table, meta in table_metadata.items():
         column_names = meta.graphql_columns.values()
-        options = list(zip(column_names, column_names))
+        column_names_for_enums = [c for c in column_names if not _is_sunder(c)]
+        options = list(zip(column_names_for_enums, column_names_for_enums))
         sort_enum = graphene.Enum.from_enum(
             Enum("{}Sort".format(meta.graphql_name), options)
         )
@@ -190,14 +191,11 @@ async def schema_for_database(datasette, database=None, tables=None):
         )

     Query = type(
-        "Query", (graphene.ObjectType,), {key: value for key, value in to_add},
-    )
-    return graphene.Schema(
-        query=Query,
-        auto_camelcase=(datasette.plugin_config("datasette-graphql") or {}).get(
-            "auto_camelcase", False
-        ),
+        "{}Query".format(db.name),
+        (graphene.ObjectType,),
+        {key: value for key, value in to_add},
     )
+    return Query

 def make_table_collection_class(table, table_class, meta):
@@ -576,3 +574,13 @@ class Namer:
             suffix += 1
         self.names.add(value)
         return value
+
+
+def _is_sunder(name):
+    """Returns True if a _sunder_ name, False otherwise."""
+    return (
+        name[0] == name[-1] == "_"
+        and name[1:2] != "_"
+        and name[-2:-1] != "_"
+        and len(name) > 2
+    )
diff --git a/tests/test_schema_for_database.py b/tests/test_schema_for_database.py
index 0a0c571..9d2b13e 100644
--- a/tests/test_schema_for_database.py
+++ b/tests/test_schema_for_database.py
@@ -1,5 +1,5 @@
 from datasette.app import Datasette
-from datasette_graphql.utils import schema_for_database
+from datasette_graphql.utils import query_for_database
 from graphql.execution.executors.asyncio import AsyncioExecutor
 from graphql import graphql
 import pytest
@@ -8,7 +8,7 @@ from .fixtures import ds, db_path

 @pytest.mark.asyncio
 async def test_schema(ds):
-    schema = await schema_for_database(ds)
+    schema = await query_for_database(ds)

     query = """{
         users {
simonw commented 4 years ago

I'm not going to implement this unless someone asks for it. I think it's too much extra implementation complexity and will encourage GraphQL schemas that are far too big.