Proposal: Webhook callback when asynchronous create binding or provision completes

trevorlinton commented 6 years ago

Purpose

When making a provision or bind call to a OSB API that is asynchronous clients must then call the last_operation end point in a polling fashion to know when the request has succeeded or failed. This creates a few un-necessary problems.

The first is the client must retain state about what it was doing and what to do next which, should the client making the provision request not persist this information and state and crash, restart or encounter a redeploy the provisioned resource is orphaned. Considering large windows for asynchronous provisioning can sometimes be up to 30 minutes this provides a large burden for clients to retain considerable state.

Second, if the broker service receives quite a large amount of client requests for provisions it may be overwhelmed with polling operations to last_operation end points.

This could be alleviated (or at least given alternate workflows for clients) by supporting the concept of an optional url callback where the client during a async provision or async create binding request provides a url where results of this operation should be sent. When the operation (successful or not) completes the url is called with the information that would have been normally returned had the caller made the provision or create binding call synchronously (NOT the last_operation end point).

Rationale

Not all systems or brokers are truly platform aware, consider a broker which is designed to issue databases via REST interface, itself may not do the binding operations (but internally keep track of who is using the provisioned database), a client may be the calling platform to a broker to create a new resource, then capture those credentials during the binding phase and use them to support applications on its platform. A broker may infact support multiple platforms (think an Azure/AWS/GCloud type system that may have a generic OSB database provisioner).

Since brokers in these scenarios (during binding or provision requests) have no real knowledge of the application they cannot reliably clean up an operation that may have been abandoned due to an intermediate failure by the client during the provision or binding window, they also rely on the client to continue to poll which can lead to wasteful operations or significant delays in provisioning should a clients poll interval be misconfigured and considerably too long.

Design

On PUT /v2/service_instances/:instance_id or PUT /v2/service_instances/:instance_id/service_bindings/binding_id if accepts_incomplete=true is passed, an optional webhook and secret may be provided. The webhook parameter MUST be a valid URI-encoded http or https URI. The secret MUST be provided if the webhook parameter is provided and MUST be text no more than 32 bytes used by the client to validate requests coming from the broker to the client. The secret SHOULD be unique to each provision or create binding call.

Brokers would be required to perform an http or https POST with the results of the completed operation (error or not) to the URI in the webhook parameter. The http operation MUST have the content-type: application/json header.

To validate the request came from the broker by the client (and to prevent reply attacks) the secret is NEVER passed (in any form) to the webhook url. A http header x-osb-signature MUST be provided in the webhook http or https POST the value of which is the SHA-256 HMAC of the serialized payload. The raw binary HMAC is encoded in base64 and NOT hexadecimal representation, nor should it be a base64 encoding of a hexadecimal string.

The result of the webhook operation (http status code, headers, etc) is ignored regardless if an error is returned by the webhook destination. The body of the response of the webhook is also ignored by the broker and highly recommended to not be processed or downloaded.

Considerations

The design does not take into account if the webhook destination is unreachable by the broker due to firewall limitations, and therefore should not be required.
It does not have a sophisticated mechanism for managing errors coming from the callback url (e.g, < 200 or > 399 error codes), and therefore should not be required.
It could potentially open up problems where a user could design a a malicious webhook url that returns a gigabyte of data in its response, and a misconfigured broker would attempt to buffer the response back and could create denial of service attacks. Therefore a strong recommendation to ignore all or any bytes coming back from the http or https response should be immediately ignored.
This may be slightly confused with #572 and this proposal attempts to solve an issue unrelated to it.
It may be easier to simply use the authorization information received in the create binding or provision request as the secret and get rid of it for simplicity. This does create some complexity though for brokers as they may have to keep track of the authorization. A more complex "hash" of the authorization could be kept instead, but this then significantly complicates the verification by the client.

tinygrasshopper commented 6 years ago

Thanks for the writeup.

Another possible solution to reduce the problem of the broker being 'over polled' could be, the broker returns a timestamp in the last operation response after which the platform should poll the broker again.

duglin commented 6 years ago

IBM to see if they have any need for this

duglin commented 6 years ago

Would the broker telling the platform when to poll next help?

trevorlinton commented 6 years ago

Returning a timestamp or when to poll next does help with congestion but doesn't help resolve the problem of holding state by the client (and potentially orphaning resources if the client crashes).

As the broker must already store state (e.g., what is being provisioned), it's fairly trivial for the broker to also store a callback url (and secret) and subsequently call the url when finished. The addition of this functionality would greatly ease the burden of developing clients and I believe would increase its adoption, albeit I'm programming a client, so I'm a bit biased in my assessment.

At a very minimum adding a timestamp is a great addition regardless, but a reactive approach (one that allows the broker to communicate back with the client after extended periods of time) would be helpful.

duglin commented 6 years ago

I checked with our product guys and we definitely want the "when to poll next timestamp" feature. As for the call-back, they like that one as well.

jmrodri commented 5 years ago

The callback has come up a few times with the Automation Broker (aka Ansible Service Broker), but our broker doesn't know "where" the platform is to return the call, i.e. the url would have to be a system that is accessible to the broker. If that's the case then I don't see why it couldn't. I would definitely make it optional.

From a broker authors perspective, we would assume that the uri given is accessible otherwise messages will effectively get dropped. If you are losing messages fix the firewall to allow the broker to access the uri.

This isn't a feature we are desperate to have, but I see the value in it. +1 to webhook

As far as alleviating the polling, timestamp might be useful especially with longer provisions. On a different project we had these long provisions that we were polling so we started with a long poll interval to allow the provision to get as far as possible, then did shorter and shorter intervals as we knew it would be close. This helped with the bombardment of requests.

So +1 to the timestamp as well.

mattmcneeney commented 5 years ago

@tinygrasshopper is going to have a go at putting a PR together that solves this problem by allowing service brokers to return a timestamp indicating when a Platform should next request the status of an async operation.

mattmcneeney commented 5 years ago

We believe #621 should resolve this issue, so will track progress over there

mattmcneeney commented 5 years ago

Closing as we believe #621 will help here and the desire for having a webhook seems to be low. Please reopen if I'm wrong though!

openservicebrokerapi / servicebroker