Support async binding operations

bmelville commented 7 years ago

Binding to an instance may encapsulate a long running workflow, similar to creating an instance. This might include provisioning new user accounts in a service, setting up firewalls, etc. It seems like the Bind API would benefit from the same last_operation model that Provision uses, and I'd like to get feedback from others who may have more context around this.

Latest proposal: https://github.com/mattmcneeney/servicebroker/blob/async-bindings/spec.md#binding

ccemeraldeyes commented 7 years ago

+1. I have already run into a scenario where I needed async binding (Google CloudSQL).

angarg12 commented 7 years ago

We also had the situation where bindings need to return information, potentially asynchronously.

This raises the question of whether it would be better to implement the last_operation function, or a fully fledged GET method on bindings.

Would a GET operation cover your use cases?

RamakrishnanArun commented 7 years ago

Would that be adding additional information in the response to the GET /last_operation?

avade commented 7 years ago

I would advocate that we follow the current async pattern of using last operation.

Adding a new endpoint to get bindings would also mean the bindings are stored in a retrievable way, which is likely a security issue.

On 17 Feb 2017 9:32 a.m., "Arun Ramakrishnan" notifications@github.com wrote:

Would that be adding additional information in the response to the GET /last_operation?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openservicebrokerapi/servicebroker/issues/137#issuecomment-280601582, or mute the thread https://github.com/notifications/unsubscribe-auth/AAeB1xfbqr9z7Kv6GX9EvHvc9JnIUiotks5rdWkkgaJpZM4MDStm .

avade commented 7 years ago

Unless there is any further discussion I would suggest that @bmelville creates the proposed spec changes on a branch. This can then be shared with the cf-dev mailing list.

I would support this feature.

Any concerns @shalako?

avade commented 7 years ago

@vaikas-google will you be picking up drafting the spec changes in a branch?

vaikas commented 7 years ago

I think so. Though for the next few days I'm not able to get to this due to other commitments still being on my plate, but I'll tackle this as soon as I get a chance.

On Thu, Mar 2, 2017 at 3:14 AM, Alex Ley notifications@github.com wrote:

@vaikas-google https://github.com/vaikas-google will you be picking up drafting the spec changes in a branch?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openservicebrokerapi/servicebroker/issues/137#issuecomment-283625918, or mute the thread https://github.com/notifications/unsubscribe-auth/AKwedJQHUXRR7IjbS81SeFRjeY5aVyG_ks5rhqR4gaJpZM4MDStm .

avade commented 7 years ago

It is also interesting the spec doesn't include timeout information.

Currently, Cloud Foundry times out binding requests after 60 seconds by default. This is configurable by the operator on a global basis, meaning it applies to all brokers.

https://github.com/cloudfoundry/cloud_controller_ng/blob/master/bosh/jobs/cloud_controller_ng/spec#L570

@pmorie does service catalog have any request timeout logic that you are enforcing?

mattmcneeney commented 7 years ago

I've created a branch that adds async binding and unbinding to the spec: View proposed spec (or changes).

Please provide any feedback you have on this change. The spec changes are relatively straightforward, making use of the workflows already in place for provisioning, updating and deprovisioning. Thanks!

mattmcneeney commented 7 years ago

Following yesterday's call with the working group, I've updated the branch with an updated spec containing a last operation endpoint for service bindings.

Let us know your thoughts, thanks!

gberche-orange commented 7 years ago

@mattmcneeney I understand that the synchronous binding endpoint PUT /v2/service_instances/:instance_id/service_bindings/:binding_id returns the credentials and there is no GET endpoint to retrieve credentials afterwards.

However, I don't see the credentials being returned by the last operation GET /v2/service_instances/:instance_id/service_bindings/:binding_id/last_operation endpoint

https://github.com/openservicebrokerapi/servicebroker/commit/c1f610a62c3d7f6ecd42aea9d8cc6d50236143f3#commitcomment-21437274

How would platform clients fetch the binding credentials ?

mattmcneeney commented 7 years ago

Good spot @gberche-orange, it should be in the asynchronous response too.

I've updated the spec so that credentials (and other information) can be included in the last service binding operation endpoint response.

avade commented 7 years ago

@mattmcneeney as we discussed, this seems like the simplest path forward for brokers.

It does mean brokers that support async responses would need to hold on to the credentials / be able to retrieve them until "last_operation" is called.

mattmcneeney commented 7 years ago

Thanks for the feedback on this so far everyone. Does anyone else have any further thoughts on the proposed spec?

gberche-orange commented 7 years ago

thanks @mattmcneeney for the update to the GET /v2/service_instances/:instance_id/service_bindings/:binding_id/last_operation endpoint response.

I wonder whether the specification should mandate that the last_operation endpoint only returns once the credentials, i.e. within the response where "state": "succeeded" was returned.

This would make it consistent with the current synchronous behavior for which there is currently no GET endpoint to retrieve credentials, possibly strengthening security by reducing credentials undesired retransmits over the wire, and reducing the duration into which broker should manage credentials state.

This would be a more defensive wording than the current

Returning "state": "succeeded" or "state": "failed" will cause the platform to cease polling.

mattmcneeney commented 7 years ago

An alternative solution would be to add an additional GET endpoint that would return the binding information. The platform talking to the broker could handle permissions on who could actually see any information returned in that call (such as credentials).

Additional logic around how long service brokers have to keep credentials for (ie. until the last_operation endpoint has been polled and a 'succeeded' response was returned) seems complex, but you're right that it would work more similarly to the synchronous behaviour.

Edit: There may be overlap with issue Fetching current input parameters values (for a service or a binding) regarding the alternative solution mentioned above.

gberche-orange commented 7 years ago

I don't have strong a opinion on either approach.

Potential drawbacks I can think of for new GET /v2/service_instances/:instance_id/service_bindings/:binding_id new endpoint:

It seems that adding this new GET endpoint for credentials further pollutes the API specs for brokers that would only need to support support synchronous bindings.
client callers might get confused in expecting that the GET /v2/service_instances/:instance_id/service_bindings/:binding_id new endpoint would also be returning synchronously created binding credentials

mattmcneeney commented 7 years ago

I agree with those drawbacks @gberche-orange.

Since this overlaps with issue #159 regarding Fetching current input parameters values (for a service or a binding), we can wait to see what everyone's thoughts are on that issue before making a decision here.

duglin commented 7 years ago

For those who want to see a diff, I think this is the link: https://github.com/openservicebrokerapi/servicebroker/compare/master...avade:async-binding

duglin commented 7 years ago

Couple of comments:

https://github.com/openservicebrokerapi/servicebroker/compare/master...avade:async-binding#diff-958e7270f96f5407d7d980f500805b1bR328 talks about bindings instead of provisioning.
why are service_id and plan_id needed on: https://github.com/openservicebrokerapi/servicebroker/compare/master...avade:async-binding#diff-958e7270f96f5407d7d980f500805b1bR350 ? The text says they should be there but I'm not clear on why and what happens if they don't match the previous binding request? If they're optional it seems like we should just remove them since the broker is expected to work w/o them.
same question for "operation". Are brokers expected to support returning the status of a bind() and unbind() at the same time? Or just the last one invoked? It seems like "operation" only makes sense if the platform can query for a previous command, but then of course that makes the phase "last_operation" kind of misleading.
for the previous 2 comments, if the answer is "they're there just for consistency and not for technical reasons" then ok - I can live with that, but then we may want to consider deprecating them.
on https://github.com/openservicebrokerapi/servicebroker/compare/master...avade:async-binding#diff-958e7270f96f5407d7d980f500805b1bR367 the text implies they should keep polling until they get a 200, but it seems to me that they should stop on other errors too, like a 404.
Along those lines, do we need to add text to say that brokers should be able to return a 200 on this new API immediately once the binding() response is sent. Meaning, waiting until the async op completes before this URL is valid would not be appropriate.
https://github.com/openservicebrokerapi/servicebroker/compare/master...avade:async-binding#diff-958e7270f96f5407d7d980f500805b1bR685 add a period before "Additionally".
there are spots where we talk about "broker clients" ( https://github.com/openservicebrokerapi/servicebroker/compare/master...avade:async-binding#diff-958e7270f96f5407d7d980f500805b1bR701 ) and spots where we say "platform" - we may want to be consistent. This might be an issue for the spec itself and not this change though. Just noticed it.

mattmcneeney commented 7 years ago

Thanks for the feedback @duglin. In response to the above:

The reference to binding was a mistake. Fixed.
The service_id, plan_id and operation were left in for consistency. I'm not sure how they are used currently for the existing last_operation endpoint. Anyone?
Agreed that the existing and new endpoints should clarify to stop polling on an error also. Fixed.
I'm not sure I understand do we need to add text to say that brokers should be able to return a 200.... Brokers supporting async bindings should return a 202.
Period added before "Additionally"
I agree with the inconsistent broker client vs platform. I've fixed it in this branch so that we only refer to platform, but happy to move to another branch if preferred (or use broker client instead; platform made more sense to me from a CF perspective).

mattmcneeney commented 7 years ago

Updated spec: link

duglin commented 7 years ago

@mattmcneeney re: "I'm not sure I understand do we need to add text to say that brokers should be able to return a 200.... Brokers supporting async bindings should return a 202."... right now a broker should return a 202 in response to the provision() and bind() operations, if they're doing to do it asynchronously. However, what's not clear is when the URL for the "last_operation" because valid/alive for the new resource. I think its implied that it should be alive immediately but the spec doesn't say so. Which means, its possible for someone to implement it in such a way that the URL will return a 404 until the resource is fully ready. For example, if they simply accept the incoming request and stick it in a queue, then return a 202. They may not do the work to make the polling endpoint available until something starts to processes that entry the queue.

The reason I thought about this is because of a previous comment about error responses to the async polling causing the platform to stop polling. Well, if they stop on a 404 and the broker is kind of goofy and returns a 404 until the resource is ready then the platform will stop prematurely.

I think simply adding text along the lines of:

Once a broker returns a 202 in response to a provision or binding request, the corresponding
`last_operation` endpoint should be ready to respond to polling requests immediately.

should cover it.

mattmcneeney commented 7 years ago

Aha, I understand. Good spot, the spec was definitely a bit loose. I'll add the above note to the spec for all of the 202 responses.

mattmcneeney commented 7 years ago

@duglin How's this? (Whole spec)

duglin commented 7 years ago

@mattmcneeney looks good - thanks!

mattmcneeney commented 7 years ago

Branch updated: https://github.com/mattmcneeney/servicebroker/tree/async-bindings (Depends on #159)

jmrodri commented 7 years ago

@mattmcneeney LGTM

shalako commented 7 years ago

https://github.com/mattmcneeney/servicebroker/compare/get-endpoints...mattmcneeney:async-bindings

mattmcneeney commented 7 years ago

I've started to draft the proposal doc for this: https://docs.google.com/document/d/16yobzCdZCOcKgHMf8gICJnDeoV8Jp3eUM0Y_t4FoEAA/edit#

@jmrodri thanks for offering to help! @avade @shalako info on any CF-specific use cases would be great @duglin @arschles same for your respective platforms

bmelville commented 7 years ago

Added comments to the proposal doc but will mirror them here.

We use identifiers which are not strictly GUID, but have some sanity (e.g., conform to [a-zA-Z0-9-]+), and it would be difficult in some cases to have to start generating, storing, and using additional GUIDs just to conform to the requirement that operation IDs be GUIDs.

This also is contradictory to the other parts of the API spec which recommend but do not require GUIDs.

I'd love to see that aspect relaxed to some degree, to a requirement that maintains sanity of IDs but is not overly restrictive as to put a burden on producers who do not necessarily use GUIDs as their unique identifier.

bmelville commented 7 years ago

One common naming requirement for IDs I've seen is to limit to unreserved URI characters, and perhaps also impose a length limitation (e.g., 64 characters).

https://tools.ietf.org/html/rfc3986#section-2.3

mattmcneeney commented 7 years ago

We should consider #267 when designing these new last operation endpoints. Are folks happy for me to roll @bmelville's suggested changes from that issue into the proposed branch for this issue?

mattmcneeney commented 7 years ago

I've updated the async bindings proposed spec changes following our discussions yesterday.

@jmrodri could you take a look here and let me know what you think? Hopefully I haven't missed anything!

jmrodri commented 7 years ago

@mattmcneeney LGTM I see you clarified the original last operation to include 'Service Instance' which is good. I confirm that the async binding looks like the service instance last_operation which is precisely what we agreed to at the F2F. Nothing stood out to me.

@pmorie LGTM.

mattmcneeney commented 7 years ago

Thanks @jmrodri @pmorie

I I get two more LGTMs on this issue, I'm happy to move this into the validation through implementation phase. @bmelville @shalako @duglin

duglin commented 7 years ago

was the gdoc updated with the proposed changes from the f2f? In particular, I didn't think we were going to add a new "operations" API:

Proposal: Add /si/:id/sb/:id/last_operation[?operation=...] endpoint Only one async anonymous (no “operation” string) at a time Do not touch /si/:id/last_operation[?operation=...] Add text to explain why we can get the last op on a resource when a GET on that resource would fail.

mattmcneeney commented 7 years ago

gdoc isn't but the branch is!

https://github.com/mattmcneeney/servicebroker/blob/async-bindings/spec.md#binding

duglin commented 7 years ago

ok - updated the first comment with the link so its easier to find. thanks

pmorie commented 7 years ago

My only feedback at this point is that I think it would be best if we located the 'async' operations part of the spec as pertains to bindings after describing the operations on bindings instead of before.

kibbles-n-bytes commented 7 years ago

There seems to be ambiguity with the response body of a 202 Accepted for bindings. Right now, it seems like you could have the broker respond with a 202 but also give credentials at that point in time, in effect saying "here's the credentials, but don't use them yet". You could even have this flow for "binding_retrievable": false instances.

Is this a flow we want to support? I'm thinking no, but in either case I think it would be good to clarify.

vaikas commented 7 years ago

I would rather have the resource (credentials, etc.) be available through GET and not through the initial body. Otherwise, you'd have to worry about things like can you support partial ones on the initial request and then reconcile, etc. etc. I think semantically saying you have to poll until the resource either successfully gets created or fails, and if it's successfully created, you can then fetch that resource and the credentials, etc. that it contains.

kibbles-n-bytes commented 7 years ago

I agree; the added complexity is not worth supporting a fringe use case. I think we should clarify that if you receive a 202 Accepted, the platform will ignore any credentials payload that happens to be returned.

duglin commented 7 years ago

While I agree with the proposed direction for async/202's - this sentence, for the body of the 202, in the spec is worrisome:

The response body SHOULD be the same as if the broker were serving the request synchronously.

I'm interpreting the current state of the spec as the exact opposite of what @vaikas-google was proposing. Meaning, the async-response/202 has the final results body, and the last response from the poll to last_operation is almost meaningless, other than to say the op is done. I'm curious to know what the intent of the SHOULD is. Meaning, if the broker chooses to not include the creds then how does the platform finally get them? @shalako ?

Yes, it might be nice to follow the message exchange with a GET, but GET will be optional.

It seems to me that we may have to add something to the spec that says the "real response" can be in one but not both response flows. Meaning, you can return the creds in the async/202 OR in the final last_operation response, but not both. Then the platform won't need to worry about conflicting data. Yes, this means adding it to the last_operation response too.

duglin commented 7 years ago

See: https://github.com/openservicebrokerapi/servicebroker/pull/307 for some proposed wording changes around lines line:

:instance_id MUST be a globally unique non-empty string. :binding_id MUST be a globally unique non-empty string.

duglin commented 7 years ago

Reading the proposal text, I think we should include the binding response (and provision response) in the final response from last_operation - assuming its not part of the original async response/202. This will save people to trouble of doing a GET immediately and will always work even when GET isn't supported.

duglin commented 7 years ago

why is in the case of a Binding request snippet in the async binding ( https://github.com/mattmcneeney/servicebroker/blob/async-bindings/spec.md#polling-last-operation-for-service-bindings ) section? Isn't the entire section about "bindings" ?

duglin commented 7 years ago

In this

If brokers do not return this operation field, only one asynchronous operation MAY be supported at a time.

I think this needs to use a MUST instead of MAY which implies OPTIONAL, e.g.:

If brokers do not return this operation field then brokers MUST NOT return any other asynchronous response without an operation field while the first request is still being processed. In other words, without the data provided by the operation field, it would not be possible for the broker to distinguish two polling requests to the last_operation endpoint for two different bindings, therefore, only one such request can be active at a time.

vaikas commented 7 years ago

@duglin Regarding handing the credentials (or any other binding related return values) right up front kind of defeats the purpose of the async provisioning. If they take any time to create, then why even bother with the async request?

vaikas commented 7 years ago

@duglin I also think that the binding should be gettable resource and in that case last operation polling should only be used to tell whether the requested resource (binding) was created successfully or if it failed, polluting that response with the resulting resource information seems wonky to me. It's a single GET so the overhead for doing that is minimal and I thought we wanted to make Bindings gettable so why wouldn't we return it there for consistency? By consistency I mean if bindings are gettable resource and hence useful for reconciling differences then wouldn't it make more sense to just have one place to fetch the credentials?

openservicebrokerapi / servicebroker

Support async binding operations #137