palantir / conjure

Strongly typed HTTP/JSON APIs for browsers and microservices
https://palantir.github.io/conjure/
Apache License 2.0
421 stars 66 forks source link

Support migration of endpoints from POST to PUT #614

Open mglazer opened 4 years ago

mglazer commented 4 years ago

Motivation

There are times when I want to migrate an endpoint from a POST method to a PUT method. Primarily this is brought on by placing my service behind an nginx proxy, where nginx by default will not retry POST requests. In the case that my endpoint was already idempotent, or I do some work to make my endpoint idempotent, I want to have some mechanism for safely transitioning my endpoint from POST to PUT in a reasonable way.

Currently, there are two ways I have found to achieve this:

  1. Create a legacy endpoint which sits next to new PUT endpoint, but responds to POST:
      doWork:
        http: PUT /doWork
        args:
          work:
            type: set<Long>
            param-type: body
        returns: list<Work>
      legacyDoWork:
        http: POST /doWork
        args:
          work:
            type: set<Long>
            param-type: body
        returns: list<Work>
        deprecated: POST methods are not retried by default, please use DoWork
          with PUT
  1. Handle this with "magic" at the infrastructure layer. With the undertow implementation, this just involves adding Endpoints which supports POST in addition to PUT.

I personally am in favor of option 2 because I do not want to have legacy methods sit on my public API forever, but there are others who disagree and prefer less magic. I would prefer a solution that makes everyone happy.

Proposal

I'd propose that we allow for someone to describe an endpoint something like the following:

      doWork:
        http: PUT|POST /doWork
        args:
          work:
            type: set<Long>
            param-type: body
        returns: list<Work>

Which tells clients to prefer PUT, but enables server implementations to automatically generate PUT and POST methods (where the POST implementation is identical to the PUT).

I'm also open to basically any other option here which doesn't involve API migrations requiring parallel APIs.

ferozco commented 4 years ago

Hey @mglazer,

what is the issue with approach 1), marking the endpoint as deprecated and then eventually removing it? Alternatively, if the goal is to mark specific endpoints as idempotent why not do that explicitly using endpoint markers?

carterkozak commented 4 years ago

Endpoint markers don’t tell nginx and dialogue whether or not a request may be retried. I think using the right verb is the correct solution in this case, markers will result in friction elsewhere.

mglazer commented 4 years ago

So my reasoning for not doing your suggestion is the following:

Say I have a number of customers for my service which I do not have access to, how do I know if they’re using the new endpoint or the legacy one? I really don’t. I also can’t trivially find that out without asking them for request logs. I also can’t do this by asking them to grep their code, because all of their code only references the doWork method. The true search needs to be a function of the library version they pulled in and me communicating which specific version of the client libraries contains the change.

The second reason is that it pollutes my API forever. I’ll make the assumption that I can never delete these legacy endpoints for the reasons I specified above. So now I have possibly hundreds of “Legacy” methods on my API forever which serve no purpose except for code generation on the server. It seems less than ideal.

carterkozak commented 4 years ago

Practically why can’t we delete the “legacy” versions if we could complete the migration from put->post? Or would we support that dual method mode forever?

Our clients emit deprecation metrics when they hit deprecated endpoints (on thumbs at the moment otherwise I’d link the definition) so we can query for clients which are hitting endpoints that the server has deprecated even prior to client library upgrades. Does this help your scenario?

mglazer commented 4 years ago

So in theory we could. But the problem comes down to how do i reasonably detect and manage that deprecation process? I know the metrics that exist and if I had access to every single running service metrics I could answer this, but the answer is that I don’t have that, and it comes down to, will I bother spending the significant energy required to manage this deprecation process?

Likely not. So the likely end state here is I leave the legacy endpoints on my API forever, because the effort to clean my API for new users isn’t worth the risk and energy required to migrate legacy users.

markelliot commented 4 years ago

If you can't help clients upgrade, how does the proposed solution help with the migration?

Re: observability/client identification: You should be able to see server-side which endpoints clients are using and user-agents should help you track down the callers.

More generally, I think we'd be better off teaching nginx (or whatever proxy we end up using) and dialogue how to recognize whether an endpoint is idempotent even if it originally selected a non-idempotent verb. Roughly, we should default to respecting verbs as an indication of idempotency, and then allow marker-based overrides.

nginx can be taught to retry POST requests under certain conditions, and that's already a solution we have in use to overcome this challenge.

markelliot commented 4 years ago

Maybe another idea here would be to have the server return a Conjure-Retryable: true header, and we can then set up our proxies and clients to respect that header.

carterkozak commented 4 years ago

@mglazer fysa @lycarter automated the process of migrating from POST to PUT on a large internal project. Yes it leaves method stubs on some resources, but they simply delegate to the non-deprecated method with all args and return the result.

Roughly, we should default to respecting verbs as an indication of idempotency, and then allow marker-based overrides.

This is an interesting idea, I'm a little worried about the complexity of splitting the definition of idempotence across verb and other metadata, I suppose it depends whether or not we want to consider changes as API changes or not. I think it's helpful to bind this concept directly to the api location (verb-path tuple). How much cleanliness do we gain by supporting the fallback marker over excluding it entirely? Negotiating between differing client and server versions with different marker values can be difficult, particularly if a connection is terminated after the client has made a request but before the server has sent a response.

Maybe another idea here would be to have the server return a Conjure-Retryable: true header, and we can then set up our proxies and clients to respect that header.

Unfortunately that approach requires that the client and server both get far enough along that the client is able to read a response from the server. Unfortunately that's not always the case.

mglazer commented 4 years ago

@carterkozak : I'm aware of the auto-migration. I wouldn't have filed this issue if I felt that the auto-migration was actually a correct or optimal solution to this problem. As I stated above, I don't think that the right thing to optimize on is how long it takes a single developer to run some script over their existing APIs. What we should optimize on is the cost to consumers of having an API that may have its surface area increased by 2x (I say this as I am currently fixing a compile break because I have a class which implements one of these interfaces as a Delegate pattern and I am fixing some 50+ methods)

Note that I did suggest another option here, which is somewhat similar to @markelliot 's idea, which is:

  1. We add an Idempotent marker to Conjure definitions.
  2. When generating service implementations (particularly the Undertow implementations), we transparently implement both POST and PUT (this is an exceptionally tiny amount of code, and it would be auto-generated).
  3. New clients call with PUT, legacy clients call with POST.

We can still add special headers and such to tell nginx to be friendlier. I like that because it makes things a little more explicit, but I also think just letting the service implementations accepting of both PUT and legacy POST requests from legacy clients in the case where we mark a method as Idempotent, then that solves a lot of what I would want (perhaps I'm being short sighted here though).

lycarter commented 4 years ago

A few musings, based on my experience during the migration: