Open SirJalias opened 2 years ago
Hey. Thanks for the very informative post!
✅ I think the UI part is good and something we do want to reach to get detailed, per-property information
❓ Schemas
column in UI is a bit confusing.. as I understand its a link to a service that defines it? Or is it a link to Query.brands
? If its a service, I'd name the column accordingly.
❓ What happens if you have brands
service, but images
service for example extends type Brand
? You'll need to show multiple services linked to the type in UI in schemas
column? Or is it going to be under specific Brand.url
property?
❓ I see you have type_defs_operations
tied to the service table.. but I don't think thats how it should be.. one query can hit multiple federated services.. But I guess we can discuss final DB structure in the PR itself, not here
called usage reporting plugin, this plugin will take all the requests within a period of time and when this time comes or the data is larger than the configured value, it is sent to the schema registry to the /api/ingress/traces endpoint
Regarding architecture, I understand that this plugin seems useful and in small projects its simple to integrate. But putting it into the gateway seems like a risky move to me, because : ⚠️ if it makes sync requests for every operation, then it can overload schema-registry/db, making it more like real-time service that we don't particularly want to do. This can cause service to fail responding to /schema/compose and /schema/latest requests which is very bad ⚠️ if it does aggregation or sampling (throwing away some queries) before it makes requests to schema-registry, then we won't get sufficiently detailed information about usage
Thats why we're using async query processor - https://github.com/pipedrive/graphql-schema-registry/tree/master/src/worker/analyzeQueries where you can control the load / processing speed yourself. It doesn't have performance (speed of queries) though ofc. but I think thats a better architecture though somewhat more complex for smaller projects as it needs an event bus (kafka) that would be responsible for storing queries. It can extract client name/version here https://github.com/pipedrive/graphql-schema-registry/blob/master/src/worker/analyzeQueries/index.ts#L77 if its passed down from gateway's headers.
Having said that, I think we can accept your sync solution the plugin into v4 only if UI in the end can show/work with both sync & async datasources. So for the end user, it should be a simple choice:
So I'd ask you to check & change async worker to add/update data in your DB tables too (see examples for the setup) such that you could see usage in your views. (or we can collaborate on the same PR)
P.S. I wonder how are you going to show in the UI types that migrate from one service to another and tie it to the usage too.
Hello @tot-ra ,
I am opening this issue to share the features the teams at @ManoManoTech have been doing in the later months, following issues #123 & #124 so when we started the repository was at the v3 and now it is in v4 and has diverged a lot from what we have now, so we would like to know if it is worth to focus on joining the features or not, so I will explain what we have done.
1 - Schema Breakdown
In order to know if a breaking change can be allowed there is the need to store the different fields in every query/mutation/subscription so that when there is a push of a new schema this information is broken down into parts.
So there have been created tables starting with
type_def_*
and I will explain its relationships.1 service can contain n operations, defined in
type_def_operations
with an operation_id1 operation stored in
type_def_operations
can have n parameters stored intype_def_operation_parameters
.A parameter can be an input field or the response of the operation, represented with the is_output field 0 means is an input, and 1 is the response type.
In the table,
type_def_types
is stored the naming and the type ( SCALAR, ENUM, DIRECTIVE, or OBJECT) of all the schemas, and its definition is stored intype_def_fields
So, let’s do an example, with the data of the request to push endpoint with an schema of the brand service:
So the values stored in the different tables are:
Services:
id
name
is_active
updated_time
added_time
url
4
brands
1
NULL
2022-09-01 15:00:20
http://127.0.0.1:4003/api/graphql/brands
type_def_operations
id
name
description
type
service_id
1
_entities
NULL
QUERY
4
2
_service
NULL
QUERY
4
3
brand
NULL
QUERY
4
4
brands
NULL
QUERY
4
type_def_types
id
name
description
type
1
_Any
NULL
SCALAR
2
Int
NULL
SCALAR
3
String
NULL
SCALAR
4
ID
NULL
SCALAR
5
extends
NULL
DIRECTIVE
6
external
NULL
DIRECTIVE
7
key
NULL
DIRECTIVE
8
provides
NULL
DIRECTIVE
9
requires
NULL
DIRECTIVE
10
Brand
NULL
OBJECT
11
_Service
NULL
OBJECT
12
_Entity
NULL
OBJECT
type_def_fields
id
name
description
is_nullable
is_array
is_array_nullable
is_deprecated
parent_type_id
children_type_id
1
fields
NULL
0
0
1
0
7
3
2
fields
NULL
0
0
1
0
8
3
3
fields
NULL
0
0
1
0
9
3
4
brandId
NULL
0
0
1
0
10
2
5
description
NULL
1
0
1
0
10
3
6
id
NULL
0
0
1
0
10
4
7
logo
NULL
1
0
1
0
10
3
8
market
NULL
0
0
1
0
10
3
9
platform
NULL
0
0
1
0
10
3
10
slug
NULL
0
0
1
0
10
3
11
title
NULL
0
0
1
0
10
3
12
sdl
The sdl representing the federated service capabilities. Includes federation directives, removes federation types, and includes rest of full schema after schema directives have been applied
1
0
1
0
11
3
This will be reflected in the UI like this
If someone wants to know what is the "contract" of the query
brands
clicking on it can be seen the definition of it:To give some numbers, we have in our organization around 30 queries and 60 objects provided by 14 subgraphs and this number is going to increase in the coming months
2 - Client awareness
The objective of this feature is to have information in the UI about who ( client & version ) is using a query or an object in the super-graph.
The architecture is summarized in this schema
Apollo Gateway receives all the requests the clients perform and there is a plugin from Apollo that is called usage reporting plugin, this plugin will take all the requests within a period of time and when this time comes or the data is larger than the configured value, it is sent to the schema registry to the
/api/ingress/traces endpoint
.So as you can see there is no custom gateway needed as this is plugged with the already available tools from Apollo.
When the Usage reporting gets to the schema registry and is decoded the payload is something similar to:
As you can see here there is all the information we need to do the client tracking: "clientName": "test-gateway-client", "clientVersion": "0.0.1",
and also the query performed.
when the request is received the schema registry performs these actions:
So if there is a key in the Redis store we do an increment of the operations and the errors accordingly to the message received.
The UI looks this way when the stats button is clicked:
At scale level if the gateway receives 5k requests, there is not done 5k requests to the schema registry it will depend on the configuration of the usage report plugin how it group all the information of those requests.
Right now we have found some bugs and we know that we need to dedicate some time in order to fix the bugs and perform enhances to this feature or else some architectural changes like using Kafka in order to not lose any usage message and prevent the main thread of the schema registry to process all the information.
About the payload we have in the apollo gateway right now in prod it is not too high, 2.5k req / min
3 - Breaking change control
As there is a control to know which data is used by the clients we can control when pushing a new version of the schema of a subgraph if there is some breaking change allow to push this new version only if there are no clients using this data otherwise this request will be rejected.
So finally, if you got to the end of this and there is a lot of data to digest, first of all, thank you and then we are very interested in knowing your opinion about putting all this stuff together with what there is in v4 and deciding how to move forward. If following this issue is getting too difficult we also propose to do a meeting in order to get an alignment together.
Thank you very much