kamilkisiela / graphql-hive

GraphQL Hive is a schema registry and observability
https://the-guild.dev/graphql/hive
MIT License
401 stars 89 forks source link

hive as a persisted operations/documents store #659

Closed n1ru4l closed 3 days ago

n1ru4l commented 1 year ago

Having GraphQL Hive act as a persistent operation store would be great.

Critical user flow

Nice to have (stretch goals; follow up tasks)

-->

Allow users to decide whether to do breaking change detection based on app deployments or usage data.

Delete persisted operation deployment flow

  1. Drop the persisted operation deployment via UI or CLI
  2. Schedule Async Task for actually deleting the persisted operation documents from S3
  3. (optional) Releasec for conditional breaking changes

We must figure out how to incorporate the persisted operation schema coordinates within the hive check/breaking change detection flow (usage data). As long as a persisted operation deployment is active, the field is in usage, even if there is no data in the retention period. After a persisted operation deployment has been deleted/marked as retired/inactive, the deployment schema coordinates removal is no longer blocked by the deployment.

Based on the usage data, we can notify users when a client version seems unused (e.g. old mobile client).

Documentation

Details

Some ideas on how to store stuff...

S3 Key Structure

Here we write the graphql documents as long as the deployment is active - we need to ensure that it is removed from S3 if the deployment gets inactive. Thus a transactional background job seems inevitable.

persisted/{orgId}/{project}/{target}/{client}/{clientVersion}/{operationHash}

SQL

CREATE TABLE "persisted_document_deployments" (
  "target_id" uuid NOT NULL,
  "client_name" text NOT NULL,
  "client_version" text NOT NULL,
  "is_active" boolean -- if it is active you should not be able to add new operations to it
);

CREATE TABLE "persisted_documents" (
  "id" uuid,
  "persisted_document_deployment_id" uuid REFERENCES "persisted_document_deployment"("id")
  "hash" text NOT NULL,
  "operation_document" text,
  "document_s3_location" text NOT NULL, -- we should store a reference (in case we at some point have to change the key structure/pattern
  "schema_coordinates" text[], -- see notes
  "created_at" TIMESTAMPTZ NOT NULL DEFAULT NOW() -- this column is most likely unnecessary
);

-- Everything here is not necessarily required for the initial version - but could help for breaking change detection...

CREATE INDEX "persisted_documents_pagination" on "persisted_documents" USING GIN ("schema_coordinates");

-- get list of all operations that are related to a set of schema coordinates
SELECT
  "persisted_documents"."hash",
FROM
  "persisted_documents"
  INNER JOIN
    "persisted_document_deployments"
      ON "persisted_document_deployments"."id" = "persisted_documents"."persisted_document_deployment_id"
WHERE
  "persisted_document_deployments"."is_active" = TRUE
  AND "persisted_documents"."schema_coordinates" && '{A.foo,B.ff}'

When a deployment has been created and "frozen", we could generate the schema coordinate ---> hash mapping, for quick lookups of which a schema coordinate impacts operations. 🤔 Alternatively, we can execute the SQL live for each active deployment.

Unsure whether we should store all the schema coordinates used within a document alongside the document. 🤔

PROs:


Links:

kamilkisiela commented 1 year ago

It could also show a complexity score next to each document.

kamilkisiela commented 1 year ago

Could also reject documents with complexity higher than X

n1ru4l commented 1 year ago

S3 could be used as a schema registry

kamilkisiela commented 1 year ago

Yes and Hive should control it all

n1ru4l commented 1 year ago

a few analytic stuff we can do here as well.

e.g. display how many bytes saved from client <-> server requests by using persisted operations over time

kamilkisiela commented 1 year ago

Plus some part of data processing (related to the usage reporting pipeline) might be done ahead of time and the structure of the usage report could be much different much much lower in size (and more performant on the user side - no processing of documents involved)

n1ru4l commented 3 days ago

https://the-guild.dev/graphql/hive/product-updates/2024-07-30-persisted-documents-app-deployments-preview