Open LavaDesu opened 3 years ago
If the devs are interested, I could try my hand at implementing this. I made a proof of concept on my fork which allows a simple query of a user by id.
@LavaDesu thanks for the proof of concept! are you able to do some benchmarks (just using ab
) against a normal API request when only requesting a small subset of data? I'd be interested to know how much performance improvement this provides, as that would be the main reason we would consider taking this on.
I would like to say that the benefits of implementing a GraphQL API is not only performance, but also the flexibility that comes with it. Theoretically, you would be able to do something like
and all that can be done in one single request, which would be faster than multiple REST requests due to overhead. It is also strongly-typed in an easy-to-read schema, and can throw errors if the server sends unconforming data (such as sending null for a non-nullable field).
But here are the "benchmarks", unfortunately there were a few problems with it:
curl --write-out '%{time_total}'
.Response time: ~315 ms (avg over 5 requests)
Response time: ~256ms (avg over 5 requests)
Response time: ~228 ms (avg over 5 requests)
Response time: ~263 ms (avg over 5 requests)
Also consider that we have additional security and authorization factors that have to be implemented as additional overhead, in addition to not simply allowing arbitrary query combinations for performance reasons; i.e. predictable query shapes are preferable.
Yep, obviously the queries would need to be limited to what is already indexed and available. Or at least be run on dedicated hardware paid for by something.
At the end of the day the overhead of actually surfacing more information than what we are already providing via endpoints would likely be the same or more, compared to just making rest endpoints.
not simply allowing arbitrary query combinations
Definitely. We don't want people to execute potentially abusive queries. We can employ a few methods to alleviate this.
The depth of a query can be limited to a set amount. This avoids people abusing circular fields.
We can also assign a "cost" to each field (say, 1 for regular fields and 4 for fields which needs to fetch another model). If the total cost of the query exceeds a set limit, we can reject it.
The library I used (Lighthouse) have both of these as built-in options, but we can override their rules provider and add our own validation rules as well in case we need more complex rate-limiting (like a leaky bucket based on cost). There are also other methods to defend against expensive queries, a good resource is this page.
I'm going to attempt this again. Currently I've got most of the base done, including auth, scopes, and complexity calculation. Things left to be done are rate-limiting and finally structuring the schema.
Should I open a draft PR or only open one after it's all completed?
Sounds fine, yes.
Endpoints like
/users/{user}
and/beatmapsets/{beatmapset}
return huge amounts of data that most clients probably never need.For instance,
beatmapsets.recent_favourites
can take up >50% of the total json response, but there are very rare cases where you actually need that as an API consumer.A GraphQL API can alleviate issues like this on both the client and server by transferring and processing only the necessary data, while also allowing complex yet efficient queries that would otherwise require multiple REST requests.