Open mdeicas opened 11 months ago
Thanks @mdeicas!
For the questions:
Pagination is being introduced on the GQL side via https://github.com/guacsec/guac/issues/1525. If the REST API implements a query requiring multiple backend queries (and depending on the output), it would make sense to introduce pagination on REST API side from the beginning so that we do not run into re-work down the line.
I would assume, based on the framework, that authentication should already be part of its implementation. We could start (like we have with the experimental one) without it and have the ability to add it later as needed.
The standard approach is to define the schema with OpenAPI and generate server and client code.
Two code generators seemed the most promising:
For various reasons, I ruled out
In short, deepmap/oapi-codegen is better than OpenAPITools/openapi-generator because
The drawback of deepmap/oapi-codegen is that it can only generate client code in Go. To generate client code for other languages, OpenAPITools/openapi-generator would need to be used. Both of these tools use the Apache license.
For reference, here is some of the generated code for an OpenAPI endpoint, SearchPackageNames
, that takes a single string parameter and outputs a list of purls.
The server interface from deepmap/oapi-codegen looks like:
type SearchPackageNamesResponseObject interface {
VisitSearchPackageNamesResponse(w http.ResponseWriter) error
}
// generated code implements the above interface
type SearchPackageNames200JSONResponse PurlList
// generated code implements the above interface
type SearchPackageNamesdefaultJSONResponse struct {
Body Error
StatusCode int
}
type StrictServerInterface interface {
// (GET /search/packages/names)
SearchPackageNames(ctx context.Context, request SearchPackageNamesRequestObject) (SearchPackageNamesResponseObject, error)
}
And the server interface from OpenAPITools/openapi-generator looks like:
type ImplResponse struct {
Code int
Body interface{}
}
type DefaultAPIServicer interface {
SearchPackageNames(context.Context, string) (ImplResponse, error)
}
Oapi-codegen supports echo by default, but chi, gin, mux, fiber, and iris have been added by the community. However, oapi-codegen can only generate the strongly typed interface shown above for chi, gin, and echo.
Chi, gin, and echo are all fast and commonly used. Echo and gin are more full-featured web frameworks, while chi is an improved version of the net/http router.
Gin and echo provide their own context types, which make the Guac style of passing the logger through the context (i.e. logger := logging.FromContext(ctx)
) a bit more complicated. They both provide ways to configure and use loggers, but it might be better and more consistent to log in the same way as the rest of the codebase.
In short, this can be done with Echo by passing the logger through the http.Request
context that is nested in the echo.Context
, but this doesn’t work with Gin. It can be made to work, but only if oapi-codegen is configured to generate a less strongly typed interface.
Chi is more lightweight, but still serves all of our use cases. It is compatible with go’s http package, so the server could always be modified to use another framework in the future without reimplementing the middleware.
So I think either echo or chi would be good options.
There is another technique used to expose both a Rest API and an RPC API with one service implementation. It involves implementing a gRPC API as usual, and then using protoc with the grpc-gateway plugin to generate a Rest API Server from an annotated protobuf API specification. This generated server simply forwards requests to the gRPC server. Grpc-gateway can also generate an OpenAPI V2 schema for the generated server. There is a helpful diagram of this design in the grpc-gateway readme.
Client code, generated with protoc, makes requests directly to the gRPC server. The REST HTTP Server would then only be used to serve a webpage or by individual users (e.g. with curl), as any programs could use the gRPC client code directly. In the absence of such use cases, the REST HTTP Server does not need to run.
The protobuf service definition needs to be annotated with the mapping of RPC methods to HTTP endpoints by adding a google.api.http option. There is some documentation on this transcoding here https://google.aip.dev/127 and in the linked protobuf. It looks like this
message PackageName {
string name = 1;
}
message PurlList {
repeated string value = 1;
}
service Guac {
rpc SearchPackageNames (PackageName) returns (PurlList) {
option (google.api.http) = {
get:"/search/packages/names"
};
}
}
The interface to implement is standard, as generated by protoc:
type GuacServer interface {
SearchPackageNames(context.Context, *PackageName) (*PurlList, error)
}
The main drawback of this approach is the overhead of running two new servers, which would increase especially in a production environment. The benefits are staying in the gRPC ecosystem, which Guac is already familiar with. Another note is that gRPC supports returning large lists of results natively with streaming gRPCs, which could eliminate the need to add pagination to the API.
Grpc-gateway provides another way to serve a Rest API once a gRPC API has been implemented, but with a single server instead of two. It generates boilerplate code that adapts each incoming request so that it can be handled by the gRPC handler instead of a regular http handler. This approach results in a single REST HTTP server, implemented by way of a gRPC service handler. As before, an OpenAPI schema that specifies the server can also be generated.
This approach is a bit awkward because it mixes paradigms. The API is specified in protobuf, but all of the middleware is implemented as HTTP server middleware. Furthermore, gRPC features such as streaming RPCs are not supported, and only a single http web framework, mux, is supported. Finally, as the server running is a REST HTTP server, client code cannot be generated by protoc. Another code generator for OpenAPI, such as those discussed in the first approach, must be picked up.
I think the decision comes down to how much overhead results from running two servers versus how much benefit is gained from staying in the gRPC + protobuf ecosystem, as implementing both (choosing oapi-codegen) seems to be fairly equivalent. In my opinion, the simpler approach of directly implementing the REST HTTP server seems better.
Thanks for writing this detailed analysis @mdeicas !
I think having a proto definition which we use already use in GUAC would be nice, but like you said it mixes the paradigms, and I think that it introduces a world where you'd want to use gRPC for some things and then REST for the other, which ends up not having any of them be "first-class". Since most policy engines will likely use REST (due to overhead of creating a gRPC client), the native support there should be priority. Thus having to add pagination on top of a gRPC streaming implementation.
RE: REST API
The drawback of deepmap/oapi-codegen is that it can only generate client code in Go.
I think I misunderstood this the first time round so want to clarify - this means that no other language has a client code generator, not that go only implements the client codegen, but not the server codegen.
Given this is REST, i don't think there's a tight coupling for clients, so it should be ok for other languages to use a different codegen - it should still work yea?
It looks like both projects are used widely and maintained. It also looks like the cost of moving frameworks will likely be low if for some reason we have to do it.. so i think we should go ahead with whichever we feel most comfortable with.
RE: Web frameworks
I don't think we have too many requirements here, our bottleneck will likely be the backend - so imo the simpler the better. If we do have to switch we should pick one that is supported by the other preferred codegen, OpenAPITools/openapi-generator, which seems to be net/http, Gin, Echo.
Would love to see comments from other folks who have maybe used some of these libraries/generators before.
The drawback of deepmap/oapi-codegen is that it can only generate client code in Go.
I think I misunderstood this the first time round so want to clarify - this means that no other language has a client code generator, not that go only implements the client codegen, but not the server codegen.
That's right, sorry for the confusion. deepmap/oapi-codegen generates Go server code and Go client code, but not Java or Rust client code.
Given this is REST, i don't think there's a tight coupling for clients, so it should be ok for other languages to use a different codegen - it should still work yea?
Yup I think this would be fine.
Adding the OpenAPI Spec I used to generate the examples above for reference:
openapi: "3.0.0"
paths:
"/search/packages/names":
get:
summary: Search packages by package name
operationId: searchPackageNames
parameters:
- name: name
in: query
description: the name to search for
required: true
style: form
schema:
type: string
responses:
"200":
description: A list of purls that match the search
content:
application/json:
schema:
$ref: "#/components/schemas/PurlList"
default:
description: unexpected error
content:
application/json:
schema:
$ref: "#/components/schemas/Error"
components:
schemas:
Purl:
type: string
PurlList:
type: array
items:
$ref: "#/components/schemas/Purl"
Error:
type: object
required:
- code
- message
properties:
code:
type: integer
format: int32
message:
type: string
We're going to go with a standard REST API because it is the simplest option and, and use oapi-codegen
over OpenAPITools/openapi-generator
for the reasons outlined in a previous comment. Either echo
or chi
are good options, but we'll go with chi
for now because it uses the standard library context and http handlers, which makes things a bit simpler.
@mdeicas thanks a lot for this analysis.
Myself and @dejanb have been investing in creating some new (GQL) endpoints in a fork of guac for some specific use cases we tackled.
A feedback I can share is that our initial approach has been exactly what has been proposed here, i.e. connecting to GQL ontology endpoints in order to have our new endpoints available, no matter the running Guac's backend (and so then easily contribute upstream the new endpoints).
The issue with this approach, at least for us, has been about performances: being able to load all the data to run the correlations in order to create the response was heavily memory and time consuming with the Ent backend.
In the end, we had to abandon the "GQL ontology endpoints" approach and, in our solution, create directly specific Ent queries for covering the requirements from our use cases.
The drawback of optimizing new endpoints letting them to interact directly with the backend is that you have to provide an implementation of each REST endpoint for each backend but I think it's an expected consequence for having multiple backends.
If having the requirement for REST endpoints to interact only with GQL is mandatory, I can see two options when implementing a new REST endpoint:
If having the requirement for REST endpoints to interact only with GQL is NOT mandatory, then each REST endpoint could be allowed to leverage GQL ontology endpoints OR directly connect to the backend.
I would like to collect everyone's feedback on this (@mdeicas @pxp928 @lumjjb)
By allowing REST endpoint to directly use backend we can open it for experimenting with new use cases and advanced queries. Once those use cases gain traction and prove to be generally useful, we can turn them into GQL query that should be implemented by the other backends.
Yup I think these are good points. Using the ontology API as a level of indirection also limits the use of capabilities that each datastore may have, such as native graph traversal queries.
I think we should support a default implementation of most or all REST API endpoints using the ontology API, to serve as a reference and for the inmem backend.
After that, to support endpoints that depend directly on datastores, I agree with @dejanb to first add them in the REST API and add a way to swap in the optimized implementation depending on which backend is being used. This avoids GQL schema changes that would be necessary in the alternative approach, which is beneficial because changing this is not as lightweight. And yes, if one of these becomes generally useful, we can decide then whether to upgrade it to the GQL API.
yea +1. Hmm ideally i think it should have a standard interface, although i think that given the experimental nature of this - and to better understand optimization issues, i think it would make sense to directly work with some backends only, and then slowly work evolve towards getting to a v2 of the ontology interface (GQL or otherwise) to a v2 which will work effectively for this.
1326 and design doc provides the motivation for developing a new REST API in Guac. A proof of concept API Server has also already been added to Guac. This issue is to discuss how to move towards a more production ready solution.
Motivation
In general, the new API can address the limitations of the GQL API (i.e. the ontology API). To reiterate the points made in the linked issue and the design doc, these limitations include:
A REST API addresses [3], adding parsing/formatting capabilities (such as returning results in purl format) addresses [1], and adding analysis endpoints (such as those in Guacone) addresses [2].
Vision
The proposed vision for the server is that it will be a grab-bag of capabilities contributed to by the community, motivated by specific use cases, and not a reimplementation of the ontology API in REST. Users will be able to choose to query the ontology via the GQL API or more use case specific endpoints via the REST API.
The alternative to this is that the API also serves the ontology. However, the GQL API already exists and I don't think there is any motivation to add this now -- it could always be added in the future.
Requirements
With the above vision in mind, here are some requirements for the new API and server.
Additionally, the frameworks used should be fast, modern, and use acceptable licenses.
Some questions to resolve are:
I’ll follow up with some thoughts on what frameworks and code generators should be used.