twitchtv / twirp

A simple RPC framework with protobuf service definitions
https://twitchtv.github.io/twirp/docs/intro.html
Apache License 2.0
7.06k stars 328 forks source link

Spec doesn't mention CamelCasing the URL #244

Closed ofpiyush closed 3 years ago

ofpiyush commented 4 years ago

The way URLs are defined makes one believe the exact name of service is accepted. https://github.com/twitchtv/twirp/blob/03ac4c/docs/spec_v5.md#urls

In practice however both service and method names are changed. https://github.com/twitchtv/twirp/blob/03ac4c/protoc-gen-twirp/generator.go#L956

Should I add this to the spec for clarity and such that others do not get blind-sided by it while implementing?

@khevse discovered it in twirpy I believe at least the rust, swagger, TypeScript, Dart and php implementations suffer from this.

Aside: The method itself is marked internal API. It might be easier for implementations if it were public https://github.com/twitchtv/twirp/blob/03ac4c/internal/gen/stringutils/stringutils.go#L71

spenczar commented 4 years ago

Great catch. I think this is a bug in the Go implementation.

Fixing the bug by removing the camelcasing poses a compatibility challenge. If the generator is changed, then vendored clients won't request the right URLs for updated servers.

I think protoc-gen-twirp should generate code that accepts both camelcased and unchanged names. I think the only time this would cause problems is for pathological cases like this:

service Svc {
  rpc AmbiguousName(Msg) returns(Msg);
  rpc ambiguousName(Msg) returns(Msg);
}

but that service was broken anyway.

ofpiyush commented 4 years ago

Now I am more confused.

For the proto

package simple
service hello {
    rpc world(msg) returns(msg);
}

Is the desired behaviour:

  1. Use service and method names as is. /twirp/simple.hello/world
  2. CamelCase service name but not method name. /twirp/simple.Hello/world
  3. CamelCase both. /twirp/simple.Hello/World
spenczar commented 4 years ago

I think the spec points to 1. Use service and method names as is. I think protoc-gen-twirp is in the wrong. If it switches, though, it might cause compatibility problems; accepting both (again, just in protoc-gen-twirp) would soften the blow.

spenczar commented 4 years ago

To be crystal clear: I no longer work at Twitch, and I'm not in charge around here. I haven't even had a chance to write much Go in 6 months. My opinion is just that, my opinion.

I think it's also reasonable to change the spec to match the Go implementation. It's just that that sends a funny signal - it indicates that the spec is just a loose human-readable description of the Go implementation, rather than being the concrete definition that the Go implementation should strive to implement.

ofpiyush commented 4 years ago

I think you're correct in saying that spec should guide implementation.

It is difficult/messy as the mux handler use cases will break for server without external code change as PathPrefix is going to change if we go with spec.

This needs discussion with current decision makers I guess. I can raise a PR which moves go from 3 to 2 and ensures backwards compatibility as a start.

dpolansky commented 4 years ago

We have a meeting lined up to discuss this issue next week. I'll report back when I have notes from that conversation.

dpolansky commented 4 years ago

I'm back with notes!

We agree that the language implementations should follow the spec, because otherwise it would no longer be a spec. Given that there's potential for ambiguity, I intend to make a PR to clarify in the spec that the service and method names should be used literally. We should also document that the Go implementation is not currently compatible with the spec in this regard.

We spent the bulk of the conversation talking about different approaches to fixing the incompatibility. Here are some brief descriptions of those approaches (its not a full list):

The general consensus was that we should make it clear to users that the Go implementation does not follow the spec, and that it may not be ideal to change the Go implementation's behavior for compatability purposes if there is an alternative. The last item on the list, warning the user about the spec incompatibility, seems like a reasonable middle-ground: it makes the incompatibility clear to users and tells them how to fix it, yet it does not break the Go implementation's current behavior. Of course, warnings are easy to ignore, so that approach has its limitations.

Does it seem like we're on the right track? Does the "warn the user" solution seem like a reasonable way to mitigate this issue?

spenczar commented 4 years ago

Yeah, I think warning is a balanced and pragmatic approach here. I think that switching to literal casing in a new major version should be done too - you could do both of those things, right?

ofpiyush commented 4 years ago

Thanks for taking the time to talk on this internally.

I agree with @spenczar with some additions:

Are you against anonymous statistics reporting from protoc-gen-twirp somehow to get a picture of how bad things can get? If go small case proto is low single digit percentage for ~3 months, maybe we can drop the CamelCase and say sorry to the ones affected? 😅


Edit: Re-ordered 2 and 3. Improved wording.

dpolansky commented 4 years ago

There's no question that the Go implementation is in violation of the spec, but I'm not yet convinced that the impact of the violation is severe enough to warrant a new major version. Adding a warning to protoc-gen-twirp helps raise awareness of the issue and guides the user towards actions they can take to address it.

It's definitely gross to be in violation of the spec, but it's important to quantify the impact and weigh it against the cost of fixing it. Releasing a new major version means breaking backwards compatability and, critically, requires that users upgrade in order to take security updates. There is a clear path forward that users can take to prevent the issue from affecting them (use the protobuf style guide's naming conventions). It also seems reasonable to assume that this issue affects a fairly small proportion of Twirp users.

ofpiyush commented 4 years ago

If the impact is small, we can make the implementation spec compatible and deal with the small amount of hate that comes with it.

Correct it when the impact is still small?

dpolansky commented 4 years ago

@ofpiyush I think we're thinking of two different types of impact.

The impact of making any backwards incompatible change to the Go implementation is significant because it requires a major version bump. There's a cost that users must pay to safely upgrade to a new major version, and not upgrading leaves users at risk of not being able to take updates (such as a security fix). The version bump affects all users of the Go implementation.

The impact of the Go implementation's spec violation is relatively small because a user would need to be doing cross-language Twirp communication, and the user would need to use a protobuf schema that doesn't follow the protobuf style guide. As mentioned above, it's still possible for these affected users to workaround the faulty behavior of the Go implementation by using a protobuf schema that generates the same casing across language implementations.

I hope this distinction doesn't come across as dismissive; my intention is just to make it clear how I'm thinking about the impact and cost that comes with a major version bump.

ofpiyush commented 4 years ago

First of all, I think we need to resolve the assumptions we're working with. These are mine:

  1. Spec should be as simple as possible.
  2. No matter how we solve this, some people are going to get affected.
  3. Least number of people must be affected.
  4. Things should be as explicit and clear as possible.
  5. Default user behaviour should lead to spec compatible usage.

If something doesn't align with the project's assumptions, I'd be happy to change my POV and re-think.

Things I do not know and would really help clarity on:

  1. Are there ways of measuring how many people will get affected? (or some other proxy for effort needed to switch)
  2. How fast are users growing?

There's a cost that users must pay to safely upgrade to a new major version

This cost is what I'm trying to gauge here. If very few people get affected and we can provide tooling to upgrade, then it need not be that big an issue. We're working with information gap so maybe reducing that with anonymous reporting might help?

it's still possible for these affected users to workaround the faulty behavior

This violates the 4th and 5th assumption I'm working with. I'm seeing this as having a pseudo-spec without explicitly adding it to the spec.

Something lurking to be discovered when people start using multiple languages together. Then they'll anyway need to change their proto, which might have been given out to third party integrators. (One of the use cases @verloop was public APIs over twirp. Thankfully, proto definitions were CamelCased already)

That IMHO is worse than either extreme. Lack of this unknown hidden behaviour in some implementations was one of the big selling points of twirp over grpc for me.

Solutions

Both paths resolve this one way or the other.

dpolansky commented 4 years ago

Thanks for discussing this issue constructively, @ofpiyush.

First of all, I think we need to resolve the assumptions we're working with. These are mine:

  1. Spec should be as simple as possible.
  2. No matter how we solve this, some people are going to get affected.
  3. Least number of people must be affected.
  4. Things should be as explicit and clear as possible.
  5. Default user behaviour should lead to spec compatible usage.

If something doesn't align with the project's assumptions, I'd be happy to change my POV and re-think.

I think some of these assumptions are a bit too broad for the project in general, but I agree with some of them on this specific issue. For instance, if there was a high impact security issue in Twirp that only affected a small number of users, the impact may outweigh the fact that it only affects a small number of users.

I think we've agreed above that the spec should guide implementations, that some users are affected by this, and that clarity is important. For item 5, I agree in principle, but clearly it was unintentional that the Go implementation does not follow the spec. Given the fact that simply fixing the issue directly is a breaking change, assessing alternatives seems prudent.

This violates the 4th and 5th assumption I'm working with. I'm seeing this as having a pseudo-spec without explicitly adding it to the spec.

I disagree with you here. I think it's possible for an implementation to have a documented violation of the spec without compromising the integrity of the spec. It doesn't mean the spec is wrong; it just means the Go implementation is wrong. I think there's a distinction there.

This cost is what I'm trying to gauge here. If very few people get affected and we can provide tooling to upgrade, then it need not be that big an issue. We're working with information gap so maybe reducing that with anonymous reporting might help?

There isn't data on how widely Twirp is used today outside of Twitch, and we will not begin collecting that data. The difficulty of the upgrade isn't as concerning as the fact that all users would need to upgrade to continue receiving updates. Within Twitch alone, there are on the order of hundreds of services that use Twirp, and there's a clear organizational cost that comes with performing a major version upgrade. That's not to say that the cost is never worth paying (again, something like a high impact security issue would be worth paying that cost for, in my opinion), but it's safe to say that its significant for the community at large.

I don't think you have acknowledged the approach of documenting the violation and making it clear to users that it exists. Do you have specific concerns with that approach?

ofpiyush commented 4 years ago

I don't think you have acknowledged the approach of documenting the violation and making it clear to users that it exists. Do you have specific concerns with that approach?

I think my agreement with a major version update might have overshadowed the rest of the points in https://github.com/twitchtv/twirp/issues/244#issuecomment-652994645

Please address them individually.

mspindelhirn commented 4 years ago

Hi @ofpiyush,

just jumping in coming from a pure library user. I dont really get that approach:

  • We should make a separate spec compatible package in go. Without that, there's just one non-compliant implementation.
  • Remove the non-compliant package from this repo. Official repo with an implementation that is not compatible will keep adding to the fire.
  • Warnings should start on protoc-gen-twirp then from generated server and client code as well to actually show up in logs meaningfully somewhere. Since the implementation is non-compliant, it can be a little annoying to push people away from it.

The very first step would still require a new major version update, right? Because "just moving" packages around brings the same BC break. If so why even keep the legacy package?

ofpiyush commented 4 years ago

We should make a separate spec compatible package in go.

This wouldn't break compatibility, the new package could be made the default new user behaviour by updating readme. Effectively containing the issue to existing users, not letting it grow every day as twirp grows popular.

Warnings should start on protoc-gen-twirp then from generated server and client code as well to actually show up in logs meaningfully somewhere.

This wouldn't break compatibility either. It will be annoying yes, that is the whole point of doing it. But it will not be breaking in any way. This should be done before removing. So I'm re-ordering it that way now. Thank you for making me rethink the order!

Remove the non-compliant package from this repo.

Existing generated code imports other packages and not protoc-gen-twirp. This as well would not break backwards compatibility on either server or client.

Given packages in the ecosystem can move things around, people generally do not keep go get github.com/twitchtv/twirp/protoc-gen-twirp kind of statements in their builds. One bad merge into master can break those kind of builds.

I'd wager that the typical behaviour is to have the binary on local or already baked into an image ready before building. Updating once in a while when someone has the time and energy to do it. This would also be the perfect time for them to read about the nuances and decide the path for themselves!

Nevertheless yes, this will have some impact and would break builds, but not existing server and client code.


PS: All these plans, even a new major release (if there will be one) will be spread out over months and will need more thought.

Right now we're in disagreement on the general plan of approach.

Trade offs between maintenance overhead/ existing user impact / long term project health are the major points of conflict right now. We're all working with imperfect data and trying to present the case for each :)

dpolansky commented 4 years ago

We should make a separate spec compatible package in go.

It seems like you're suggesting that we make a second generator, direct users to it, and then delete the old generator. I think this is ultimately far more painful than it would be to just fix the issue directly and make a major version upgrade. This kind of change would mean that every user of Twirp needs to change how they generate their code. I don't think having multiple implementations of the Go generator in this repo is something we'd consider.

I'd wager that the typical behaviour is to have the binary on local or already baked into an image ready before building. Updating once in a while when someone has the time and energy to do it. This would also be the perfect time for them to read about the nuances and decide the path for themselves!

Nevertheless yes, this will have some impact and would break builds, but not existing server and client code.

I don't understand this. Twirp will never intentionally break builds for users.

Warnings should start on protoc-gen-twirp then from generated server and client code as well to actually show up in logs meaningfully somewhere.

I don't think it would be good for Twirp to just start writing warnings into stdout in the server and client code. Twirp doesn't do this today, and it could be pretty surprising behavior if it started doing this.

ofpiyush commented 4 years ago

I am trying to find a way to eventually(over months) completely resolve this. I see warnings as a step along the path and not the final solution. I think @spenczar was also hinting at that.

The option to make protobuf style guide mandatory is also open and would also lead to complete resolution.

Please reconsider the new user experience. They'll come to this repo and see both go and python implementations with warnings in the Readme. As the project grows, not all users will start from the go implementation.

Having said that, it has been over a month and I'm out of options and hope. Now I'll focus on questions specific to twirpy and for other implementations.

dpolansky commented 4 years ago

@ofpiyush I met with some internal folks again today to discuss this issue; I'm hoping this can provide some clarity on how you should proceed with twirpy and on what we're able to do mitigate this on the Go implementation side.

What should other language generators do with their implementation? Should we add warnings as well?

Implementations in other languages should follow the spec as its written. It shouldn't be necessary to add warnings to other languages because the issue should only affect Go clients that talk to non-Go servers (more on this below).

What should we tell people that started with another language implementation and run into this issue while writing their first service with go?

I think it makes the most sense to have this limitation documented in this repo, and for other implementations to link to it if questions arise.


At this time, we cannot support a major version upgrade to fix this issue. There may be a time where it makes sense to bundle several backwards-incompatible improvements together in a single major version upgrade, but we can't make any guarantees that it will happen. I can certainly empathize with the experience of both Twirp's users and maintainers of other language implementations as a result of this issue, but we are not able to effectively support a major version upgrade at this time.

To mitigate the issue, we would support the following:

ofpiyush commented 4 years ago

make the Go server implementation serve requests at paths for the literal casing used in the schema in addition to the current functionality

This will need work from the user as well in some cases. We can manage method names without any changes from the user, but service name will need changes from the user.

One possible path which will contain this to 1 user change only, is to add a new method:

example.RegisterHaberdasherServer(mux, handler)

Then we can keep changing code inside that function over releases and eventually into the new major version(if that ever happens) while making sure backwards compatibility is maintained.

this effectively contains the issue to the Go client, which would still make requests to the camel-cased paths

For go client, we can do something like this:

var UseLiteralCasing bool = false

If someone is using go client with other language servers, they can set it to true and make it work.

Both of these changes increase API surface and have trade offs that we can discuss.

Adding compatibility on method names on go server is a safe single step along this path so I am sending a PR for that now.

dpolansky commented 3 years ago

For go client, we can do something like this:

var UseLiteralCasing bool = false

If someone is using go client with other language servers, they can set it to true and make it work.

Both of these changes increase API surface and have trade offs that we can discuss.

Twirp has generally avoided configuration of this type, so it's unlikely something like this would be supported.

For now, I think the server compatability change and documentation are the right steps for Twirp to take to mitigate this. The door would remain open to potentially changing the Go client in a major version upgrade down the road. Advising affected users to change their protobuf schema (or ultimately changing the behavior in a major version) seems better than adding this sort of configuration into Twirp.

mfridman commented 3 years ago

Just a different perspective. The spec also does not explicitly mention using the service and method names "as is".

Furthermore, from the docs in the HTTP Routes:

The <Service> and <Method> names are CamelCased just as they would be in Go.

With the example:

POST /twirp/twirp.example.haberdasher.Haberdasher/MakeHat

So if anything, one would assume the desired behaviour were to CamelCase the Method and Service portions of URLs.


The protobuf style guide suggests:

If your .proto defines an RPC service, you should use CamelCase (with an initial capital) for both the service name and any RPC method names

So the assumption is majority of users are already writing proto files such as:

service Public {
  rpc UserLogin(UserLoginReq) returns (UserLoginResp);
}

Which results in: /twirp/pkg.Public/UserLogin

So, my question is

  1. What warning would you give users? (After reading this thread I'm still unsure what the recommendation is)

iirc the officially supported implementations already use CamelCasing, so why swing towards an "as-is" implementation?

ofpiyush commented 3 years ago

Twirp has generally avoided configuration of this type, so it's unlikely something like this would be supported.

The next best option I have is to generate a compatibility client hook and use a map to override URL path like I did on the test here.

User experience could look something like:

client := example.NewHaberdasherProtobufClient(url, client, twirp.WithClientHooks(example.CompatibilityHook))

@mfridman I've suggested updating spec multiple times on this thread. Thank you for digging out some official word on the matter!

@dpolansky I would strongly suggest getting internal stakeholders to talk directly on the threads that affect them.

dpolansky commented 3 years ago

@mfridman Although the section on HTTP Routes is more clear about the casing, the spec implies that the names are used literally. As you point out, many Twirp users are already writing protobuf definitions that follow the style guide. A warning could be used to tell users when their definitions are not following the style guide, which would lead to incompatibility across implementations.

I don't think we should do anything drastic to mitigate this issue right now, such as changing the spec or making backwards-incompatible changes to the Go implementation. I think we should move forward with allowing literal casing in the Go server implementation and with providing docs for users about how they can address this issue. Over time, if enough users experience pain with this issue, we can progress towards more significant fixes.

marioizquierdo commented 3 years ago

Twirp fundamentally converts the Service definition into an interface that can be implemented in your programming language of choice. Mapping HTTP routes to interfaces and methods.

A more generic version of this problem is that different programming languages have different naming conventions and restrictions. For example, a Ruby method cannot begin with a number, and is common to use underscored names. In Go, an exported method name cannot begin with a lowercase letter. But in JavaScript is common for all method names to start with a lowercased letter. Actually, HTTP routes can contain characters that would not map well to most programming languages (https://stackoverflow.com/questions/4669692/).

For example:

/twirp/mypackage.00~**Cool;;Service;;Name**~!/$($)**Cool;;Method;;Name**

So, in a way, it is unrealistic for the Twirp spec to say that routes should be handled with the exact values. The routes could be matched, but converting route characters to valid method names in the given language will end up creating routing collisions.

A more realistic approach for the spec could be to explicitly allow names that follow the Protobuf best practices (./), and be undefined for other names. This would help to setup expectations when working on new language implementations.

ofpiyush commented 3 years ago

We would certainly be having that conversation if twirp were generic and independent of protobuf.

The language specification of protobuf doesn't allow for names like the example:

/twirp/mypackage.00~**Cool;;Service;;Name**~!/$($)**Cool;;Method;;Name**

https://developers.google.com/protocol-buffers/docs/reference/proto3-spec#identifiers

ident = letter { letter | decimalDigit | "_" }

All languages that one could use twirp in, can do literal case naming following the protobuf language specification if they wish to do so.

There is a distinction to be made between wire protocol and method names. It is required for wire protocol to be as specific as possible so that we can work across languages.

There is also a distinction to be made between a style guide and language specification. If the project wishes to make the style guide mandatory, that's a very legitimate option.

Specifically,

explicitly allow names that follow the Protobuf best practices

means other languages should follow protobuf style guide. was that the intent?

marioizquierdo commented 3 years ago

@ofpiyush gotcha. No need to worry about other characters in Twirp then. Scratch that generalization I made. We only have to worry about mapping between names with letter { letter | decimalDigit | "_" } and different programming languages 👍

explicitly allow names that follow the Protobuf best practices

means that Twirp implementations in other languages must guarantee deterministic routing if the names follow the protobuf style guide. That is achievable and works well with the limitations of the Golang implementation. Keeping other cases as undefined should help clarify the possible issues that may happen when using a different naming style.

ofpiyush commented 3 years ago

Keeping other cases as undefined should help clarify the possible issues that may happen when using a different naming style.

Make this gotcha explicit in the spec document and this issue is resolved. 👍

ofpiyush commented 3 years ago

With the latest merge, it is all resolved and done! 😌

I learnt a lot about communication through this issue and related PRs, grew as a person. In a lot of places, I could have done much better, I will in the future 😅

I'm glad and very proud of people at twitch that some of my mistakes in personal communication did not overshadow the actual issue and we were eventually able to work past them.

Thank you for bearing with me through the long and arduous back and forth, especially @dpolansky.

Closing as resolved.

akshayjshah commented 3 years ago

Hi friends! I realize that the code changes for this issue were resolved quite a while ago. However, I found the GoDoc for WithClientLiteralURLs quite confusing - it jumps right into the details of the protocol without any high-level guidance.

To me, the docs would be much clearer if they started with an introductory paragraph like this:

WithClientLiteralURLs changes the client's URL-casing behavior to conform strictly to the Twirp protocol specification. This improves compatibility with other Twirp implementations. Clients should enable WithClientLiteralURLs unless (a) they're making requests to Go servers built with older Twirp versions, AND (b) the server's protocol buffer definitions don't follow the casing guidelines recommended in Google's style guide.

If you'd like, I can open a PR with similar text.