cholick commented 5 years ago

Closes #625

Adds a timeout value (at the catalog level for a service), that the platform SHOULD use as a hint to give up and declare failure.

See #625 for additional background details.

cfdreddbot commented 5 years ago

:white_check_mark: Hey cholick! The commit authors and yourself have already signed the CLA.

leonwanghui commented 5 years ago

Good solution! Currently in osb-checker we have to add some extra fields "pollingInterval": 5, "maxPollingNum": 60 in config_mock.json to configure the timeout, this PR will solve our problem once and for all : )

duglin commented 5 years ago

As this stands it seems reasonable. However, at one point in time we talked about having the broker return some info on the polling response to tell the Platform when to poll next - so it doesn't waste its time doing it too soon. I'm wondering if we should have the polling response include this data as well? I'm wondering if there will be cases where the maximum polling time would be plan based or the Broker might not know how long it'll take until it starts. For example, perhaps a free plan can be provisioned in seconds because there's not a lot of infrastructure behind it, but a Gold plan might take more time - and therefore need a larger max polling duration. We could also then make the 202 on the provision, optionally, return this metadata (ie. the polling non-finished response http body), so the platform can use this info immediately and not wait for the first poll.

what do people think?

duglin commented 5 years ago

https://github.com/openservicebrokerapi/servicebroker/pull/621 is the one I was thinking of

mattmcneeney commented 5 years ago

Ah, good point @duglin

@cholick - in #625 you said:

As a service author, I'm often in a much better position to decide what that timeout value should be. For many services, values as short as an hour would be appropriate.

Do you think #621 solves that problem (i.e. a service broker can tell a Platform when next to poll it)?

cholick commented 5 years ago

621 as written doesn't solve the problem, but yes, it could be enhanced to include a maximum. The header is a bad fit at that point, though, since there are two values.

I'm concerned that combining these two concepts isn't a great fit:

One is a reference to the point at which the platform polls the service's status and the other is in reference to when the server stated polling. That seems like a tricky thing to convey clearly.
What does it mean for the platform if that value changes?
The platform wouldn't have the value until the first poll, so I think it's also harder to implement correctly

mattmcneeney commented 5 years ago

Makes sense @cholick - I think this should be done here as #621 will likely be fixed using an HTTP header

duglin commented 5 years ago

One is a reference to the point at which the platform polls the service's status and the other is in reference to when the server stated polling. That seems like a tricky thing to convey clearly.

Sorry I wasn't clear. I wasn't necessarily suggesting to merge both things into one header. We could put one into a header and one in the http body. But, TBH, at that point I'd kind of prefer to put them both in the body (as separate fields) - for consistency.

What does it mean for the platform if that value changes?

If the current time is past the duration returned then it stops.

The platform wouldn't have the value until the first poll, so I think it's also harder to implement correctly

If the data were returned in original 202 response then no poll is needed to get the first value.

I think being able to change the timeout based on the plan might be really useful once people have this data at all.

cholick commented 5 years ago

With the payload in the body, that does make sense to me @duglin and would address both features.

I do still think it's a bit confusing to have a body with a duration field (poll again in X) and a maximum from the provision time field (stop polling after X, measured from the first provision point). My concern is that platform implementors will interpret both those fields as durations from the point at which each pole is performed. As a platform builder, I would also assume the maximum wouldn't change, so if change is allowed on a poll, I think it's important to call that out.

duglin commented 5 years ago

ok I think I see your point, and it makes sense - let's keep them separate and not allow the max to be changed on the fly. So, one last question... do you think the max should be on the service (as you have it now) or a property of the plan? I'm wondering if certain plans might require more time than others since some could be very light weight and others take much longer to provision.

cholick commented 5 years ago

Changing to plan makes sense to me. I'd gone chosen the simpler option, but the use case you outline makes sense.

@duglin @mattmcneeney - shall move that to the plan and update the PR?

mattmcneeney commented 5 years ago

That makes sense to me @cholick

duglin commented 5 years ago

LGTM

one question I have is where we should have it at the Service and Plan level, where the Plan one overrides the Service one - like "bindable".

cholick commented 5 years ago

My reasoning was to lean toward the simplicity of leaving it in a single place on the broker side (service and plan would mean having 3 potential places for the timeout: platform, service, and plan). My theory is that most broker implementations uses a base object of some kind for the plan in implementation, so using the same value wouldn't mean more code. Free is the counter-example to bindable, where each plan specifies.

I don't feel strongly about that, though - fine changing if you want @duglin.

mattmcneeney commented 5 years ago

I'd vote for simplicity in this case over consistency. If we hear demand to have a plan-level override, we can always add that later.

cfdreddbot commented 5 years ago

:white_check_mark: Hey cholick! The commit authors and yourself have already signed the CLA.

mattmcneeney commented 5 years ago

Just a thought - should this be seconds rather than minutes to allow for finer granularity?

cholick commented 5 years ago

In the existing code I know of (CF platform's timeout), the unit is minutes, so I chose that to be consistent.

mattmcneeney commented 5 years ago

That makes sense. But for consistency, how would you feel about having this in seconds for the spec?

Looking at that code you linked:

broker_client_default_async_poll_interval_seconds: 60
broker_client_max_async_poll_duration_minutes: 10080

Does it not offend you that one uses seconds and then one uses minutes?! 🤣

cholick commented 5 years ago

Good point @mattmcneeney - updated unit to be internally consistent with spec.

openservicebrokerapi / servicebroker

Service Specific Async Polling Timeout #627

621 as written doesn't solve the problem, but yes, it could be enhanced to include a maximum. The header is a bad fit at that point, though, since there are two values.