twitchtv / twirp

A simple RPC framework with protobuf service definitions
https://twitchtv.github.io/twirp/docs/intro.html
Apache License 2.0
7.15k stars 327 forks source link

Add support for custom route prefixes #48

Closed lordnynex closed 6 years ago

lordnynex commented 6 years ago

Hello,

I was really excited to read about twirp and want to undertake a project with it. Unfortunately, twirp seems to make some pretty opinionated decisions about my API's pathing. I've noticed that the generator prepends the literal "twirp" to the url, and seems to require the services internal name be exposed. I'm not familiar with writing proto generators but I assume it is possible to provide some sort of generator option to disable or customize this functionality.

For the hard coded /twirp, I'm wondering if this is a twitch specific thing or if you're open to removing it. Considering I can use my own mux, I'm capable of prepending paths already.

For the service name, it seems a bit less straight forward. It is possible to expose the handlers so they can be plugged directly into the mux. I guess the apprehension is keeping twirp from becoming a complex mux in and of itself, but I think it can be made possible without convoluting the code base.

robmccoll commented 6 years ago

I've been struggling with how to do this as well. It seems like it should be possible to extend google.protobuf.FileOptions with a set of twirp options (https://developers.google.com/protocol-buffers/docs/proto3#custom_options) with a definition like:

// github.com/twitchtv/twirp/extensions/descriptor.proto
syntax = "proto3";

// twirp - simple package name to make usage less verbose
package twirp;

import "google/protobuf/descriptor.proto";

// TwirpFileOptions can be used to customize of twirp-generated code.
// Import this pacakge into your own protobuf definitions, and then
// add one or more lines like this at the file scope:
//   option (twirp.extra).path_prefix = "/prefix/";
message TwirpFileOptions {
  // path_prefix can be used to override the "/twirp/" path prefix
  // with a prefix of your choosing.
  string path_prefix = 1;
}

// Add twirp options to the file scope
extend google.protobuf.FileOptions {
  TwirpFileOptions extra = 58977;
}

And then import that in your service definition and use it:

syntax = "proto3";
package totally.great.service;
import "twitchtv/twirp/extensions/descriptor.proto";

option go_package = "service";
option (twirp.extra).path_prefix = "/v1/service/";

// ... some super great codes here

But it isn't clear to me if or how proto-gen-go handles extensions and how to register them appropriately :-(

Adding it to https://github.com/robmccoll/twirp/blob/master/protoc-gen-twirp/generator.go would be pretty simple IMHO

Of note though: adding something like this would increase the requirements on any further generators that are trying to satisfy twirp.

spenczar commented 6 years ago

Interesting suggestion. I think of Twirp's hardcoded routes as a feature, not a problem.

The guarantee that all Twirp services route the same way is what makes it easy to write clients in other languages. It makes it straightforward to write a cURL command to ping a server with a payload from the command line. I wouldn't really want to lose that.

What are the problems you're running into with routes prefixed with /twirp/? You mentioned that it's an opinionated decision, which is true, but the hope is that you shouldn't need to care about that decision - nobody should.

I'd like to figure out why this feature is needed ahead of designing solutions just yet, but yes, options would probably be a good way to do this in a backwards-compatible fashion.

robmccoll commented 6 years ago

Personally, I see Twirp as a great way to create internal and external API services and clients. This is another advantage for Twirp over gRPC (gRPC's default binary HTTP2 probably isn't broad enough for many clients).

In an external-facing use-case though, it might be nice to version the API route, mux together similar services at different routes, be able to refactor internally without runtime re-routing to account for new package paths, etc.

At the same time, I do see the benefit of saying the paths are part of the protocol and keeping one more potential variation out of the mix. Plus it's good advertising for this quality project :-)

Edit: in case it does become useful, I updated the snippets above for correctness

robmccoll commented 6 years ago

Working patch (needs tests, cleaning, Python):

diff --git a/extensions/descriptor.proto b/extensions/descriptor.proto
new file mode 100644
index 0000000..b041b09
--- /dev/null
+++ b/extensions/descriptor.proto
@@ -0,0 +1,22 @@
+syntax = "proto3";
+
+// twirp - simple package name to make usage from proto less verbose
+package twirp;
+
+option go_package = "extensions";
+import "google/protobuf/descriptor.proto";
+
+// TwirpFileOptions can be used to customize of twirp-generated code.
+// Import this pacakge into your own protobuf definitions, and then
+// add one or more lines like this at the file scope:
+//   option (twirp.extra).path_prefix = "/prefix/";
+message TwirpFileOptions {
+  // path_prefix can be used to override the "/twirp/" path prefix
+  // with a prefix of your choosing.
+  string path_prefix = 1;
+}
+
+// Add twirp options to the file scope
+extend google.protobuf.FileOptions {
+  TwirpFileOptions extra = 58977;
+}
diff --git a/internal/gen/main.go b/internal/gen/main.go
index 181609b..3b29bd8 100644
--- a/internal/gen/main.go
+++ b/internal/gen/main.go
@@ -8,6 +8,7 @@ import (
        "github.com/golang/protobuf/proto"
        "github.com/golang/protobuf/protoc-gen-go/descriptor"
        plugin "github.com/golang/protobuf/protoc-gen-go/plugin"
+       _ "github.com/twitchtv/twirp/extensions"
 )

 type Generator interface {
diff --git a/protoc-gen-twirp/generator.go b/protoc-gen-twirp/generator.go
index 7c9ca5d..244f301 100644
--- a/protoc-gen-twirp/generator.go
+++ b/protoc-gen-twirp/generator.go
@@ -16,6 +16,7 @@ import (
        "github.com/golang/protobuf/protoc-gen-go/descriptor"
        plugin "github.com/golang/protobuf/protoc-gen-go/plugin"
        "github.com/pkg/errors"
+       "github.com/twitchtv/twirp/extensions"
        "github.com/twitchtv/twirp/internal/gen"
        "github.com/twitchtv/twirp/internal/gen/stringutils"
        "github.com/twitchtv/twirp/internal/gen/typemap"
@@ -35,6 +36,8 @@ type twirp struct {
        genPkgName          string // Name of the package that we're generating
        fileToGoPackageName map[*descriptor.FileDescriptorProto]string

+       pathPfx string // Prefix used on URL routes
+
        // List of files that were inputs to the generator. We need to hold this in
        // the struct so we can write a header for the file that lists its inputs.
        genFiles []*descriptor.FileDescriptorProto
@@ -86,6 +89,13 @@ func (t *twirp) Generate(in *plugin.CodeGeneratorRequest) *plugin.CodeGeneratorR
        }
        t.genPkgName = genPkgName

+       // Then check for a consistent prefix override (optional, must match if present)
+       pathPfx, err := deducePathPrefixOrDefault(t.genFiles, "/twirp/")
+       if err != nil {
+               gen.Fail(err.Error())
+       }
+       t.pathPfx = pathPfx
+
        // Next, we need to pick names for all the files that are dependencies.
        for _, f := range in.ProtoFile {
                if fileDescSliceContains(t.genFiles, f) {
@@ -163,6 +173,42 @@ func deduceGenPkgName(genFiles []*descriptor.FileDescriptorProto) (string, error
        return genPkgName, nil
 }

+// deducePathPrefixOrDefault checks for an overriding path setting in all files.
+// If it is set anywhere, it must be set everywhere and must match.
+// If it is not set anywhere, the default is returned.
+func deducePathPrefixOrDefault(genFiles []*descriptor.FileDescriptorProto, defaultPfx string) (string, error) {
+       var pathPrefix string
+       var empties int
+       for _, f := range genFiles {
+               iface, err := proto.GetExtension(f.GetOptions(), extensions.E_Extra)
+               if err == proto.ErrMissingExtension {
+                       empties++
+                       continue
+               } else if err != nil {
+                       return "", errors.Errorf("could not parse twirp file options extension: %v", err)
+               }
+               fileOpts, ok := iface.(*extensions.TwirpFileOptions)
+               if !ok {
+                       return "", errors.Errorf("twirp file options extension did not match expected type (got %T)", iface)
+               }
+               if fileOpts.PathPrefix == "" {
+                       empties++
+                       continue
+               }
+               if pathPrefix != "" && fileOpts.PathPrefix != pathPrefix {
+                       return "", errors.Errorf("multiple file options path prefixes found (%v vs %v)", fileOpts.PathPrefix, pathPrefix)
+               }
+               pathPrefix = fileOpts.PathPrefix
+       }
+       if empties > 0 && pathPrefix != "" {
+               return "", errors.Errorf("some files contained path prefix %v, some did not", pathPrefix)
+       }
+       if pathPrefix == "" {
+               return defaultPfx, nil
+       }
+       return pathPrefix, nil
+}
+
 func (t *twirp) generate(file *descriptor.FileDescriptorProto) *plugin.CodeGeneratorResponse_File {
        resp := new(plugin.CodeGeneratorResponse_File)
        if len(file.Service) == 0 {
@@ -868,14 +914,14 @@ func (t *twirp) generateServer(file *descriptor.FileDescriptorProto, service *de
 // pathPrefix returns the base path for all methods handled by a particular
 // service. It includes a trailing slash. (for example
 // "/twirp/twitch.example.Haberdasher/").
-func pathPrefix(file *descriptor.FileDescriptorProto, service *descriptor.ServiceDescriptorProto) string {
-       return fmt.Sprintf("/twirp/%s/", fullServiceName(file, service))
+func (t *twirp) pathPrefix(file *descriptor.FileDescriptorProto, service *descriptor.ServiceDescriptorProto) string {
+       return path.Join("/", t.pathPfx, fullServiceName(file, service))
 }

 // pathFor returns the complete path for requests to a particular method on a
 // particular service.
-func pathFor(file *descriptor.FileDescriptorProto, service *descriptor.ServiceDescriptorProto, method *descriptor.MethodDescriptorProto) string {
-       return pathPrefix(file, service) + stringutils.CamelCase(method.GetName())
+func (t *twirp) pathFor(file *descriptor.FileDescriptorProto, service *descriptor.ServiceDescriptorProto, method *descriptor.MethodDescriptorProto) string {
+       return t.pathPrefix(file, service) + stringutils.CamelCase(method.GetName())
 }

 func (t *twirp) generateServerRouting(servStruct string, file *descriptor.FileDescriptorProto, service *descriptor.ServiceDescriptorProto) {
@@ -886,7 +932,7 @@ func (t *twirp) generateServerRouting(servStruct string, file *descriptor.FileDe
        t.P(`// `, pathPrefixConst, ` is used for all URL paths on a twirp `, servName, ` server.`)
        t.P(`// Requests are always: POST `, pathPrefixConst, `/method`)
        t.P(`// It can be used in an HTTP mux to route twirp requests along with non-twirp requests on other routes.`)
-       t.P(`const `, pathPrefixConst, ` = `, strconv.Quote(pathPrefix(file, service)))
+       t.P(`const `, pathPrefixConst, ` = `, strconv.Quote(t.pathPrefix(file, service)))
        t.P()

        t.P(`func (s *`, servStruct, `) ServeHTTP(resp `, t.pkgs["http"], `.ResponseWriter, req *`, t.pkgs["http"], `.Request) {`)
@@ -911,7 +957,7 @@ func (t *twirp) generateServerRouting(servStruct string, file *descriptor.FileDe
        t.P()
        t.P(`  switch req.URL.Path {`)
        for _, method := range service.Method {
-               path := pathFor(file, service, method)
+               path := t.pathFor(file, service, method)
                methName := "serve" + stringutils.CamelCase(method.GetName())
                t.P(`  case `, strconv.Quote(path), `:`)
                t.P(`    s.`, methName, `(ctx, resp, req)`)
mkabischev commented 6 years ago

I'm not sure if I understood your question correctly, but twirp's New*Server func returns struct which also implements http.Handler interface. It means that you can use any mux to add any custom prefixes to the url.

robmccoll commented 6 years ago

@mkabischev that is true, but not quite what we're asking about.

The goal is not to just add a prefix, but to replace the automatically generated /twirp/ or /service.path.ServiceName/ prefixes that twirp expects with something else. You could route via whatever paths you wanted outside (say /v1/api/Users/Create), but you would have to strip off your custom path and rebuild the /twirp/github.com.user.project.User/Create path and replace the one in the request before calling ServeHTTP(). This is because the generated ServeHTTP() has hard coded paths in it in a switch statement that it uses to determine which method to call. We would like to be able to reconfigure how those static strings routes in the switch statement are generated. Does that make sense?

spenczar commented 6 years ago

@robmccoll I still don't think I fully understand the problem you're trying to solve with custom routes.

What makes /v1/api/Users/Create any better than /twirp/github.com.user.project.User/Create? Twirp APIs all work on POST requests, so you'd certainly never have people visiting the URL in their browser. This stuff should all be happening in the background. Why would you care what the URLs are?

You mentioned three things:

For versioning, additions can always be made safely to a Twirp API. Add new methods and new fields as you wish - there's no risk of breaking clients there.

Removals and changes to the meaning of APIs can break client behavior, so typically you make a new service (service UsersV1 and service UsersV2) or you make a new package package users.v1 and package users.v2. Either way, Twirp's routing scheme helps you here - it will serve the new API on a new route automatically.

For muxing, @mkabischev explained the approach well I think. The fixed prefix lets you mount the service next to others pretty easily.

For refactoring, I guess I'm not clear on the problem here again. In what scenario would you be changing the packaging but not want clients to know? In Twirp, you think of the service's naming scheme as part of its contract with clients, so it is certainly hard to change behind their backs, but I don't think you lose anything from that.

Could you help me understand what the problems are you're running into? As before, I'm not worried about the technical aspect of implementing it right now.

lordnynex commented 6 years ago

Thanks for considering this. I'll try and sum up my personal reasoning. The scenario is not limited to internal communications between services. In my case I'm hoping to leverage twirp as a public client facing framework.

A few other thoughts I have on this are

lordnynex commented 6 years ago

Also just for transparency, I'm evaluating twirp against https://github.com/grpc-ecosystem/grpc-gateway

I think you've got an awesome framework and I want to see it grow. I'm really happy to see the project and I think it might help you to know what frameworks and business issues you're up against.

spenczar commented 6 years ago

@lordnynex Thanks for a detailed and thoughtful response! This is great.

security and leaking internal names

I've heard this a few times and I'm not super clear on it. What's the threat model here?

re-tooling client code

I completely agree that this is harder with Twirp's fixed routes, and it's just a downside we have to acknowledge. I suspect you'd need to make other changes to clients too, though: in my experience, it's rarely simple to completely describe the message types of a hand-written API through Protobuf specs.

Even if you can, are all of your routes POST-only? That's all Twirp will accept.

And even if you only use POST, and all your types can be represented simply in Protobuf, Twirp returns errors in its own way, almost certainly different from what you have.

In the end, there's a lot of ways a hand-written API will be different from Twirp. Solving just one of them (routing) doesn't make things much easier. Solving all of them makes Twirp meaningless, as it loses its simplicity.

In practice, my recommendation is to serve your API at two different routes and migrate clients from the old API to the new one. This is slow and painful, but it's just the way it is.

infrastructure cost

I can't tell how this applies to Twirp's fixed routes. Is this line of argument that "rewriting paths through proxies is not a good solution?" I would agree with that, yes, it has high costs. Definitely a case-by-case thing.

On the other hand, it can be done in pure code in your application. You could use URL-rewriting middleware, much like the standard library's net/http.StripPrefix, to map URLs however you like. I don't think you'd be using Twirp for its full benefits at that point, though.

request routing and discovery

I don't understand this one too well and would like to know more. Twirp standardizes this much more than most hand-written APIs. You can tell exactly what the package and name of the service are, as well as the method, just from the URL. You can similarly go the other direction. Isn't that easier for discovery? I could be missing something here.

currently no well supported framework to do javascript grpc

Yeah, but it's not hard at all to hand-write a client of a Twirp API (since, after all, the routing is fixed). I think it's quite a bit simpler than hand-writing a REST API client. Whenever I have to make a client for a REST API, I need to read a whole lot of documentation to do it. https://github.com/twitchtv/twirp/pull/31 is a proof of how much simpler Twirp is: someone was able to make a valid Twirp client generator in an evening.

I don't want the framework developer to make such critical decisions for my business. It is up to me the developer to have a map of the API.

This is a deep philosophical difference, I think. I don't believe routes are an important part of API design. The important parts are what methods you expose, what types you use to communicate, how you manage state, that sort of thing. HTTP minutiae like URLs and methods are unimportant almost all the time for the actual application - all you care about is that they are consistent.

hardcoding a /twirp prefix is just advertising

I think that's a bit of an unfair jab. We use the /twirp prefix to guarantee that we don't collide with other routes (espcially grpc's routes, which are mounted at /<service>/<method>, so we know they would colide) on the same mux.

We also use it to reserve a namespace for a reflection API. This is currently half-implemented - services can provide the protobuf descriptor from which they were generated; we haven't worked out exactly how to provide it over the network but I imagine we would do it under /twirp/.

"Advertising" is certainly not one of the design goals of Twirp. Instead, a consistent prefix seemed like the simplest solution to avoid collisions and reserve space for meta-APIs.

lordnynex commented 6 years ago

What's the threat model here?

A quick search of OWASP details leakage primarily through error messages, so I don't think a link is relevant. From a corpsec perspective using service names that expose github private repo hosting gives some information that may be used for targeted phishing. I do want to point out that there have been several very high profile hacks that are the result of not patching a framework after CVE was released. The most noteworthy I can think of is Equifax getting hacked over failing to patch Apache Struts. While the frameworks are not necessarily comparable there is a history of this same vector affecting generated code in Java. I believe it was the Java Swagger generator last year. So I guess it's two fold; corpsec from disclosure of internal names and possibly locations AND appsec from advertising a front side service is using a specific framework.

Even if you can, are all of your routes POST-only? That's all Twirp will accept.

No.. Good catch 😄 ! But arguably it's easier to change a verb and request body (in react native atleast). There would certainly need to be a lot of changes to implement. I will say in a native app, these paths are usually hard coded constants. Changing them is not an impossible problem, but maintaining backwards compatibility will be tough. To be honest even if you accepted this feature request, the ability to notate HTTP verbs is likely the next feature request. Admittedly, MOST of the time the body of an error response is discarded. Generally theres an internal map of HTTP codes->business reasons. I can't say it's a good practice, but it's common. I assume there will be some integration pain here as well.

Is this line of argument that "rewriting paths through proxies is not a good solution?"

Yes. In one example, AWS has launched their Fargate service which is essentially ECS without the instance overhead. There is deep integration with their ALB which allows you to route requests based on a bunch of criteria. It's sortof unattractive in some environments to have to stick an nginx proxy or something in between. It's not an impossible solution, my point is it comes at an unnecessary cost if I have the ability to make my API routes look like what twirp is replacing. For a startup, the cost of a layer of nginx servers is impactful.

I don't believe routes are an important part of API design

Respectfully this is untrue. I do agree that at dev time it can be inconsequential, but going to production is a different game. I've spent a great deal of time in Devops and this belief has caused a lot of pain on an integration level. Most recently I worked for a large company that has this idea of 'cross functional teams'. Things like path prefix etc was removed entirely from their control. From an orchestration and discovery perspective that path had a lot of meaning. The context for this particular implementation is a large apache mesos cluster where applications are completely ephemeral. It's also not uncommon for companies to implement multiple load balancers inside of a cluster dedicated to an entire service. Being able to control at least path prefix here helps with small things that you may be overlooking like writing generic automation around healthchecks.

Arguably, twirp is not a mux and it's goal seems to not over complicate the matter. I appreciate that a lot, but I do want to point out that a lot of really amazing work has gone into different routers/frameworks and they may be better suited for doing even simple path matching. Being able to wire directly to the twirp handler allows me to reap the benefits of two good projects together. For example, I believe frameworks like echo etc have invested quite a bit into using Trie based path matching and caching. This effort can only enhance the performance and scale of twirp.

I think that's a bit of an unfair jab.

To be clear I don't mean the last point as a jab or to insinuate advertising is your goal. It is a hotly debated thing going on right now in certain projects in the go community. It's my goal to be honest and transparent with you about how I'm evaluating and intending to use the software you created so you can make well informed decisions.

spenczar commented 6 years ago

@lordnynex, I think you might be looking for a different thing than Twirp provides. There isn't really much that Twirp does other than handle routing and a prescription around serialization. If we drop the routing, it's just a serializer/deserializer... but all of that work is handled by github.com/golang/protobuf anyway, so it there's kinda nothing left.

I think what you're describing sounds interesting and there's (of course) room for multiple approaches for these things. Twirp's approach is to force standard URLs for services. If you care deeply about your URLs, and it sounds like you do, Twirp is probably not a good fit, and that's okay.

I don't mean the last point as a jab or to insinuate advertising is your goal.

Yeah, sorry I got snippy about this. I should remember to eat lunch before I get grouchy :). It's all good.

robmccoll commented 6 years ago

@spenczar sorry I missed all the updates all day. Probably shouldn't have added the patch - that was really intended to be a record for me of the changes.

I think you are underselling here:

There isn't really much that Twirp does other than handle routing and a prescription around serialization. If we drop the routing, it's just a serializer/deserializer... but all of that work is handled by github.com/golang/protobuf anyway, so it there's kinda nothing left.

Twirp does and can do much more than that. Twirp makes it trivial to generate a functioning server and what are effectively the beginnings of client SDKs in multiple languages (Go and Python now, but as with #31 and #19 - pretty much any language can have a generator in hardly any time). It works efficiently and simply over HTTP and can still be hand-coded easily. Twirp is more Go-thonic than gRPC. It's small, it's opinionated, it's quick, and it's productive.

As for being able to define the path, having a sane predictable default, but also being able to replace it to match the preferences of the organization still seems valuable to me (for most of the reasons @lordnynex explained more eloquently than I can). The beauty is that the proto description is still the ground truth and the only thing you would need to either render or hand-code a client.

For the record, I find both you and @lordnynex to be quite personable and reasonable in this exchange, so if this is you in a hangry state... you're probably in the top percentile for open source devs haha

spenczar commented 6 years ago

Well, thank you @robmccoll for the kind words. Every discussion I've had so far on this issue tracker has been really constructive and people have been very reasonable, and this is no exception.

You mention that Twirp today is simple, small, and opinionated, and that generators are quick to write. We would lose ground on all of those points if we were to support custom URL prefixes. An implementation would likely be based on options, and it's up to the generator-implementer to know how to use them - which means more documentation, but also more ways to slip up when implementing a client.

Today, I can write the spec for Twirp on the back of a cocktail napkin. We can put stuff like "here's how you make a cURL request" in documentation without any caveats. If we have to include conditional statements, like "... unless the .proto file specifies a different prefix, in which case..." then things grow in complexity very, very quickly.

I am also concerned that if we allow custom URL prefixes, it becomes harder to answer people who ask questions like:

All of these could be implemented through options in the .proto file. You'd still have one ground truth. But you'd lose the simplicity of Twirp: it would cease to be the so-simple-your-cat-can-figure-it-out framework, and instead become a kinda complex tool for describing APIs. It would become a not-very-good competitor to Swagger.

So, we have to draw the line somewhere, and the simplest place to draw it is to say that Twirp calls the shots on routing. No exceptions - it's just static and simple.

I missed this bit, @lordnynex:

Being able to control at least path prefix here helps with small things that you may be overlooking like writing generic automation around healthchecks.

For what it's worth, I think it's easier to new automation for Twirp endpoints, but I can appreciate that it's harder to reuse your existing automation. But there's nothing wrong with adding non-Twirp handlers on muxes right next to your Twirp services. I see /healthcheck-type handlers next to Twirp all the time, and have no problem with that.

spenczar commented 6 years ago

Related: #55, which proposes removing the /twirp prefix, but still uses a fixed routing scheme.