Use GraphQL from web applications to interact with Taskcluster APIs

eliperelman commented 7 years ago

Edit

Given the priority of redeployability, the team has made the decision that this will not be considered to be included as an official API provided by services until after the first redeployable release. This is to ensure that work on GraphQL does not block or impede momentum on redeployability.

As such, I am re-proposing a solution for using GraphQL only from our web application, which is being developed parallel to the taskcluster-tools redeployable instance. Commentary that takes this into account starts at https://github.com/taskcluster/taskcluster-rfcs/issues/96#issuecomment-372780345.

End Edit

From http://graphql.org/:

GraphQL is a query language for APIs and a runtime for fulfilling those queries with your existing data. GraphQL provides a complete and understandable description of the data in your API, gives clients the power to ask for exactly what they need and nothing more, makes it easier to evolve APIs over time, and enables powerful developer tools.

Why?

The Tools site fetches an enormous amount of data to display some tools. We have things like continuation tokens to limit the number of records, but nothing to control on a fine-grain level what data we can request from particular APIs. Up until now, the most useful premise is when requesting a resource is to throw everything that a user may need to avoid subsequent requests, but this is becoming hard to manage on the front-end.

When requesting a list of tasks from a group, what if I could say I only wanted the name and task status from each task? GraphQL can do that.

What if I want to fetch a list of tasks, but also a selected task, in a single request? GraphQL can do that, too.

I'm not sure if this is feasible, but I think this has at least some merits for discussion, and could be a good path forward to making inroads to the network performance for our front-end.

djmitche commented 7 years ago

I like this too! It might require a tighter data model on the backend, though. We don't really do any of the referential stuff that REST APIs typically use.

eliperelman commented 7 years ago

Also may be relevant: https://facebook.github.io/relay/

eliperelman commented 7 years ago

Nice introduction to GraphQL: https://blog.pusher.com/rest-versus-graphql

eliperelman commented 7 years ago

Had a meeting today over Vidyo to discuss this more in-depth.

Attendees: @eliperelman @helfi92 @djmitche @imbstack @jonasfj @walac

What problems currently exist that GraphQL may have a solution to?

Right now we have poor database performance and sub-optimal queries when doing large lookups from Tools
Changing APIs for Tools' benefit creates blockers against the backend

Unknowns about GraphQL

Can we abort a request using our own business logic?
Can we refuse GraphQL queries that are inefficient?

Pros for GraphQL

Nice APIs for users
Opens up the possibilities for custom reporting
Easy to modify schema and strongly type API interactions
Automatic tooling and introspection support

Cons

Essentially an abstraction for the database, exposing data structures
Needs careful planning to account for too much consumption (rate limiting) or record counts (page limits)
Not feasible with existing Azure database

Takeaways

We will reconsider GraphQL again in the future once we have made the move to Postgres. We will keep the querying API in mind when working on Postgres.
Most likely the only thing we would consider exposing through GraphQL is the Queue.

djmitche commented 6 years ago

Postgres is #65, btw

eliperelman commented 6 years ago

I have been using GraphQL a lot lately, and find it to be a great fit with React and front-end API interactivity. I decided to take on an experiment to test my hypotheses of a number of benefits GraphQL could bring to the front-end, even without Postgres support, the code for which lives here:

https://github.com/eliperelman/taskcluster-graphql-server

^{(Some of these are begging the question, but I couldn't resist.)}

Could we significantly reduce the amount of data the browser needs to download in order to render?

In downloading the data needed to render a task group, in my rough test it looks as though the download size is approximately 10% of the same request when using the Queue directly. Obviously this data is still downloaded, but it's not done by the browser. Combine this with the ability to page (which we could technically do before), and we can solve our hanging browser issue.

Can we aggregate multiple queries that the browser needs to render a page to a single network request, while still keeping download sizes minimal?

To try this out, in my Task definition I added a status property, and vice-versa, and it worked really well. I can logically get the linked/aggregated data on the client without having to make multiple requests from the browser, nor even have knowledge that the server is doing this aggregation:

query Sample($taskId: ID!) {
  task(taskId: $taskId) {
    status {
      state
    }
  }
}

What's nice is GraphQL is smart about this, and only makes a request for the status information if the client is requesting the property, potentially reducing load on the APIs.

This works deeply as well for however we want it to be. For example, if I want to render a task, along with its artifacts, along with their signed URLs, all that logic can be handled on the server, and the query is very transparent for the client:

query Sample($taskId: ID!) {
  task(taskId: $taskId) {
    status {
      state
      runs {
        artifacts {
          edges {
            node {
              url
            }
          }
        }
      }
    }
  }
}

Can we make local credential management simpler and improve how we re-generate client API instances from those credentials?

I didn't realize the benefits of this at first, but it is quite nice. With GraphQL, the browser no longer needs client libraries at all. Every request to the GraphQL server would potentially pass along an auth0 access token if the user is logged in, a request to Login is made, and from that instances of the taskcluster-client clients are generated. The logic for refreshing access to the Login service is then unnecessary in the browser, and made generic for all potential web apps that need to communicate with TC with creds.

Not to mention the reduction in bundle sizes from not having to include the client libraries.

Could this be used to reduce the amount of business logic necessary to fetch data consistently?

What's nice about this approach is that consumers *don't even need to know** that there are multiple services behind the scenes powering these queries, mutations, and subscriptions [1]. If you want a task, ask for it, same for hooks, a secret, or whatever.

Tradeoffs?

The biggest tradeoff I see is latency. Having to proxy requests through a server like this is going to introduce a small amount of latency, but I think the tradeoff is acceptable given that the bottleneck against Azure is an order or 2 of magnitude greater than the latency.

The other is maintaining yet another service, which I am happy to pick up on here since it is designed specifically to support web applications, and only exposes the data they need. I understand that the eventual goal would be to throw this proxy away when we could access all APIs via GraphQL.

Also understand that given all the benefits here (there are loads more specific to GraphQL itself, including surrounding schemas and documentation), this still wouldn't realize the performance benefits until our other performance bottlenecks were resolved.

Thoughts? I think this would be a huge developer experience win not only for Tools, but all web applications that would need to access Taskcluster APIs, even without all the improvements that Postgres would bring.

^{[1] Yes, I even did away with the need for manual Websocket management against events.taskcluster.net, because GraphQL also handles subscriptions. My server experiment connects directly to Pulse.}

jonasfj commented 6 years ago

relational API references could be tried hyper-schema

The way data in graphql is relational is awesome, this could also be done with json-hyper schema, granted I won't pretend to fully understand that aspect.

security trade-off is that server would have * scope

Only auth and queue has this at this point. It's true that tools also sort of have it, as it handled credentials client side. Generally, when we make requests using TC creds the server side can't use those creds to make another request. When the requests are authorized with an auth0 token, the server can use that auth0 token for something else (to the extend it's not scoped to a specific site).

This might not be a show stopper.

I suspect the show stopper here is that it would be a second public API, and this would be a monolithic API for all services. Graphql would be a cool thing to do, if we rewrote taskcluster-lib-api to support it, such that all APIs were graphql natively, and we did away with REST APIs. Or if REST and graphql APIs were generated from the same API declaration. However, that's a rather big thing to do, and probably involves doing away with some aspect of micro services.

eliperelman commented 6 years ago

it would be a second public API, and this would be a monolithic API for all services.

Agreed, and I think I see this splintering into 2 issues: supporting GraphQL natively from the APIs, and supporting a GraphQL layer for the web app. That is to say, we should be careful not to conflate the concerns of the 2 concepts.

eliperelman commented 6 years ago

I spoke to @ccooper today about reviving this RFC as a web app-only proposition. I have revised the description accordingly.

With the front-end portion of redeployability coming towards feature completion, that leaves Q2 open for @helfi92 and I to put some much-needed work into continuing our UX improvements for the web application. GraphQL would represent a paradigm shift in the way data is handled by the front-end, and we would like to bake it in from the start so we can reap its benefits immediately without needing another rewrite in the future if the APIs were to support GraphQL. There are still a number of benefits that I believe outweigh the tradeoffs that I previously outlined, and so we are planning to integrate a large chunk of this into the codebase in Q2.

That said, to mitigate security concerns we are going to specifically lock this down to be CORS-only limited to the web app in this first go, and will not be deployed to production anywhere until the app is a viable replacement for taskcluster-tools. That is not currently slated to happen in Q2, FYI.

This gateway would not be maintained like our other services; it would be part of a server instance specifically designed to support the web app. This is no way affects our efforts on redeployability, and will developed in parallel to any r14y needs. The original goal is still to ship taskcluster-tools with the initial r14y release.

To be clear, we plan on forging ahead with this for the web app only, starting in Q2, independent of redeployability. Security is still a hot topic, and I will continue to converse about it and improve the situation.

jonasfj commented 6 years ago

Using graphql for internal communication between tools site and tools backend, remove security and compatibility concerns...

But honestly, I would prefer a tools site without a backend... and just use APIs directly. Our primary purpose to support automation, humans are second class citizens. I think we can easily make it good enough with REST API, and in the few cases where there is too much data, I'm sure we can make some special-cased end-points with filtering to support fast/efficient dashboards.

GraphQL would represent a paradigm shift in the way data is handled by the front-end,

I think this is best counter argument.

I'm a lot less worried about using it internally in the tools site... as a protocol between tools frontend and backend :)

That said, maybe this is worth talking about on vidyo...

djmitche commented 5 years ago

I think this is pretty much done at this point -- @helfi92 what do you think, should we just close this?

taskcluster / taskcluster-rfcs

Use GraphQL from web applications to interact with Taskcluster APIs #96

Edit

End Edit