Support querying multiple databases at once

simonw commented 4 years ago

If you have multiple databases connected, /graphql could expose a field for each one.

simonw commented 4 years ago

I've been agonizing over how to make this change in a responsible way for days now. The question is, do I change the root query block to work this way instead?

So today a query like this:

{
  repos {
    nodes {
      id
      full_name
      description
    }
  }
}

Becomes this instead:

{
  github {
    repos {
      nodes {
        id
        full_name
        description
      }
    }
  }
}

Making this change would break all existing datasette-graphql deployments. There aren't many of them yet so it's not quite too late to do that, but it's getting close.

If I want to preserve backwards compatibility there are a few ways I could do it:

If a datasette-graphql instance has just one DB, keep the current format (expose all tables on the root). That gets weird if you attach a second DB and suddenly your root format changes.
Let the user control if the root node exposes databases or the tables from the first attached database. My biggest concern with that is the demo instance itself, which has two databases attached and should exhibit default behaviour.
New idea: add all database routes, but then merge all tables from all connected databases onto the root node, unless there are any table name clashes. If there ARE table name clashes, either add a _2 suffix to them or don't include them and tell people they should access via the database root instead.

simonw commented 4 years ago

If I was designing this plugin from scratch today, what would I do here?

I think I'd do database names as the root nodes. I'm slightly unhappy with the extra verbosity considering that the vast majority of Datasette instances only attach a single database.

simonw commented 4 years ago

That verbosity argument is convincing to me: I don't like forcing every user of this plugin to forever include the name of their root database (which might be something boring like data.db) on every query they ever make:

{
  data {
    repos {
      nodes {
        id
        full_name
        description
      }
    }
  }
}

So I think I should optimize for the single-database case - which the plugin supports today - but provide the ability to query additional databases too only if more than one database is connected.

simonw commented 4 years ago

So a plan for that could be that all tables from the first connected database are exposed on the root. If you have other connected databases they become available as de-duplicated root nodes (so as not to collide with any table names) - maybe even with a _db suffix. A plugin configuration setting can turn this feature on or off.

simonw commented 4 years ago

After prototyping this for a bit, I'm not sure this feature is worthwhile.

I'm getting errors like this:

  File "/Users/simon/.local/share/virtualenvs/datasette-graphql-n1OSJCS8/lib/python3.8/site-packages/graphene/types/typemap.py", line 97, in graphene_reducer
    assert _type.graphene_type == type, (
AssertionError: Found different types with the same name in the schema: usersCollection, usersCollection.

So I'll need to de-dupe the names of individual field types across the entire schema.

Much more concerning: I'm seeing 15s load times on building the schema once I attach a few different database files! This 15s delay affects everything, since the GraphQL code introspects the schema on every HTTP request.

I could fix this by caching the generated schema but then I'll need to. figure out what to do when the underlying database schema changes to invalidate the cache.

simonw commented 4 years ago

Here's where I got to with the prototype:

diff --git a/datasette_graphql/__init__.py b/datasette_graphql/__init__.py
index bef59a8..574b6a7 100644
--- a/datasette_graphql/__init__.py
+++ b/datasette_graphql/__init__.py
@@ -1,11 +1,12 @@
 from click import ClickException
 from datasette import hookimpl
 from datasette.utils.asgi import Response, NotFound
+import graphene
 from graphql.execution.executors.asyncio import AsyncioExecutor
 from graphql.error import format_error
 from graphql import graphql, print_schema
 import json
-from .utils import schema_for_database
+from .utils import query_for_database

 async def post_body(request):
@@ -26,7 +27,13 @@ async def view_graphql_schema(request, datasette):
         datasette.get_database(database)
     except KeyError:
         raise NotFound("Database does not exist")
-    schema = await schema_for_database(datasette, database=database)
+    db_query = await query_for_database(datasette, database=database)
+    schema = graphene.Schema(
+        query=db_query,
+        auto_camelcase=(datasette.plugin_config("datasette-graphql") or {}).get(
+            "auto_camelcase", False
+        ),
+    )
     return Response.text(print_schema(schema))

@@ -54,10 +61,27 @@ async def view_graphql(request, datasette):
             else {},
         )

-    schema = await schema_for_database(datasette, database=database)
+    # Complex logic here for merging different databases
+    db_query = await query_for_database(datasette, database=database)
+
+    db_queries = {}
+    for name in datasette.databases:
+        db_queries[name] = await query_for_database(datasette, database=name)

-    if request.args.get("schema"):
-        return Response.text(print_schema(schema))
+    root_query_properties = {
+        "{}_db".format(db_name): graphene.Field(
+            db_query, resolver=lambda parent, info: {}
+        )
+        for db_name, db_query in db_queries.items()
+    }
+    RootQuery = type("RootQuery", (graphene.ObjectType,), root_query_properties)
+
+    schema = graphene.Schema(
+        query=RootQuery,
+        auto_camelcase=(datasette.plugin_config("datasette-graphql") or {}).get(
+            "auto_camelcase", False
+        ),
+    )

     incoming = {}
     if body:
diff --git a/datasette_graphql/utils.py b/datasette_graphql/utils.py
index ae7e96c..d85de2c 100644
--- a/datasette_graphql/utils.py
+++ b/datasette_graphql/utils.py
@@ -57,7 +57,7 @@ class PageInfo(graphene.ObjectType):
     hasNextPage = graphene.Boolean()

-async def schema_for_database(datasette, database=None, tables=None):
+async def query_for_database(datasette, database=None, tables=None):
     db = datasette.get_database(database)
     hidden_tables = await db.hidden_table_names()

@@ -74,7 +74,8 @@ async def schema_for_database(datasette, database=None, tables=None):

     for table, meta in table_metadata.items():
         column_names = meta.graphql_columns.values()
-        options = list(zip(column_names, column_names))
+        column_names_for_enums = [c for c in column_names if not _is_sunder(c)]
+        options = list(zip(column_names_for_enums, column_names_for_enums))
         sort_enum = graphene.Enum.from_enum(
             Enum("{}Sort".format(meta.graphql_name), options)
         )
@@ -190,14 +191,11 @@ async def schema_for_database(datasette, database=None, tables=None):
         )

     Query = type(
-        "Query", (graphene.ObjectType,), {key: value for key, value in to_add},
-    )
-    return graphene.Schema(
-        query=Query,
-        auto_camelcase=(datasette.plugin_config("datasette-graphql") or {}).get(
-            "auto_camelcase", False
-        ),
+        "{}Query".format(db.name),
+        (graphene.ObjectType,),
+        {key: value for key, value in to_add},
     )
+    return Query

 def make_table_collection_class(table, table_class, meta):
@@ -576,3 +574,13 @@ class Namer:
             suffix += 1
         self.names.add(value)
         return value
+
+
+def _is_sunder(name):
+    """Returns True if a _sunder_ name, False otherwise."""
+    return (
+        name[0] == name[-1] == "_"
+        and name[1:2] != "_"
+        and name[-2:-1] != "_"
+        and len(name) > 2
+    )
diff --git a/tests/test_schema_for_database.py b/tests/test_schema_for_database.py
index 0a0c571..9d2b13e 100644
--- a/tests/test_schema_for_database.py
+++ b/tests/test_schema_for_database.py
@@ -1,5 +1,5 @@
 from datasette.app import Datasette
-from datasette_graphql.utils import schema_for_database
+from datasette_graphql.utils import query_for_database
 from graphql.execution.executors.asyncio import AsyncioExecutor
 from graphql import graphql
 import pytest
@@ -8,7 +8,7 @@ from .fixtures import ds, db_path

 @pytest.mark.asyncio
 async def test_schema(ds):
-    schema = await schema_for_database(ds)
+    schema = await query_for_database(ds)

     query = """{
         users {

simonw commented 4 years ago

I'm not going to implement this unless someone asks for it. I think it's too much extra implementation complexity and will encourage GraphQL schemas that are far too big.

simonw / datasette-graphql

Support querying multiple databases at once #44