supabase / pg_graphql

GraphQL support for PostgreSQL
https://supabase.github.io/pg_graphql
Apache License 2.0
2.8k stars 96 forks source link

Configurable Cache-Control #535

Open mathiversen opened 1 week ago

mathiversen commented 1 week ago

Summary

I'd like pg_graphql to return information on how the returned data should be cached, with inspiration from how it's been done in Async GraphQL.

Rationale

It would be highly beneficial if you could configure cache-control with the extension so that the consuming server/client doesn't have to implement such logic. This allows for centralized cache control management and ensures consistency across different clients/servers.

Design

This is how I imagine it:

  1. You can set a global cache_control that acts as the default, similar to how you set inflect_names.
  2. Override the global default with a table cache_control.
  3. Override the global default and the table default with a field/computed field cache_control.
  4. Override the global default with a function cache_control.

Examples

Global default

comment on schema "public" is
e'@graphql({"inflect_names": true, "cache_control": "public, max-age=60"})';

Table override

comment on table "person" is
e'@graphql({"cache_control": "max-age=30"})';

Field override

comment on column "person".password is
e'@graphql({"cache_control": "private, max-age=0"})';

Computed field & functions override

comment on function public._role is
e'@graphql({"cache_control": "public, max-age=30"})';

Req example

{
  personCollection(first: 1) {
    edges {
      node {
        id
        username
        password
        role
      }
    }
  }
}

Res example

{
  "data": {
    "personCollection": {
      "edges": {
        "node": {
          "id": 1,
          "username": "kalle",
          "password": "qwerty",
          "role": "user"
        }
      },
    }
  },
  "cacheControl": "private, max-age=0"
  "errors": []
}

Since the client requested the field password, the default value is overwritten and private, max-age=0 takes precedence. I chose to use the default browser header formatting with a single key-value over a full JSON. This requires a little more work in the extension to parse each string and calculate the final output, but I think it will provide a better experience from the server's perspective as it can just take the value and use that in the response to clients further down the line.

Drawbacks

There will be more logic/code inside the extension.

Alternatives

If this is not provided by the extension itself, then the servers/clients need to parse and evaluate the request/response to decide on how the data can be cached. I've not found a good solution for doing so other than writing my own logic, which doesn't live up to the potential of the fine-grained control described above.

Unresolved Questions

  1. Should the cache-control values be strings or objects in the comments?
  2. Where should the cache-control info be in the final response? In the top-level JSON object next to data and errors?
  3. What cache-control fields/options should be supported? Async Graphql only support public/private and max-age. Should pg_graphql support the same or more?

Thank you for publishing & maintaining this extension!

olirice commented 1 week ago

On the Supabase platform pg_graphql/GraphQL is exposed using a PostgREST RPC function named graphql_public.graphql

PostgREST has a mechanism for setting arbitrary response headers. You can read more about it here

If you need this functionality, one option would be to

  1. copy the graphql_public.graphql function into the public schema public.graphql with the same signature
  2. change the function language to plpgsql
  3. add the header logic you want

Something like:

CREATE OR REPLACE FUNCTION graphql_public.graphql(
    "operationName" text DEFAULT NULL::text,
    query text DEFAULT NULL::text,
    variables jsonb DEFAULT NULL::jsonb,
    extensions jsonb DEFAULT NULL::jsonb)
    RETURNS jsonb
    LANGUAGE 'plpgsql'
    COST 100
    VOLATILE PARALLEL UNSAFE
AS $BODY$

            perform set_config(
                 'response.headers',
                 '[{"Cache-Control": "public"}, {"Cache-Control": "max-age=259200"}]',
                 true
             );

            return graphql.resolve(
                query := query,
                variables := coalesce(variables, '{}'),
                "operationName" := "operationName",
                extensions := extensions
            );

$BODY$;

I appreciate that it won't be as declarative and comprehensive as your proposal but hopefully this can give you a place to drop the logic for the your most critical hot pathes

you'd then send your GraphQL queries through your own endpoint at

https://<project_ref>.supabase.co/rest/v1/rpc/graphql
mathiversen commented 1 week ago

Thanks for your input! We're using pg_graphql via Neon and run our own REST API as an intermediary. I have limited experience with PostgREST, but your proposed solution seems to overlook the different req/res variations unless, as you mentioned, we implement it ourselves - which I'm trying to avoid. I believe all users of the extension would benefit from a more granular cache-control mechanism.

Instead of parsing the req/res in middleware or in a pg function to identify sensitive data, the extension could check comments to determine specific cache-control requirements for each field. I’m not sure how extensive your logic is for detecting and setting these headers in Supabase, but it seems error-prone, as each header must be evaluated and set individually. If the extension returned a computed header value, it could be used to set response.headers, making the process more dynamic.

olirice commented 1 week ago

While I'm not against this functionality in general, there is no globally accepted standard for returning cache info for GraphQL + the majority of users are operating through PostgREST where there is an escape hatch for setting cache control headers. For those reasons, this is unlikely to get picked up in the timeframe you need it.

If you're already running your own intermediary API I'd recommend looking into OTS options for graphql middleware caching