Closed ncameronbritt closed 7 years ago
@pmorie I think this is a service catalog problem, please can you assign as required.
Issue 1: the error message "Error deprovisioning ServiceInstance"... is wrong. Reading the code, I think the SC is meaning to report that the provision has failed (which is indeed the case).
I see no evidence that the SC immediately attempts to deprovision as a result of the failed provision (nor am I saying that it should).
Issue 2: I am able to remove the failed provision via the web console, but this does not result in the SC calling the TSB to deprovision. As a result the related objects are not deleted by the TSB.
I don't know what the OSBAPI spec says in this case, but I think that fundamentally the SC should notify the broker in question when it wants to deprovision a failed provision.
The OSB spec does 'orphan mitigation' if there is a communication error or timeout that means that the platform can't be sure if the provision succeeded or not. The broker is responsible for cleaning up after failed provisions that it knows about.
The OSB spec does 'orphan mitigation' if there is a communication error or timeout that means that the platform can't be sure if the provision succeeded or not.
pretty sure that is not the case here. we told the SC the provision failed.
Not sure how the SC would ever not know, since it's an async provision flow, so the SC can just keep calling lastoperation until you get a result.
(also how is the TSB supposed to know if the SC got a communication error?)
@bparees @jim-minter My understanding from @pmorie is that the service catalog does not call deprovision by design if the provision fails. It's assumed that the broker will clean things up automatically on a failed provision and there's nothing for the service catalog to do.
I think that the spec is deficient in specifying what should happen here (I don't think the orphan mitigation piece covers it). What implemented behaviour do other brokers have?
So far I haven't found any indication that the Ansible broker self-deprovisions in the case of a provision failure. @jmrodri ? I've had a cursory look and haven't found any code implementing this in the sample brokers listed at https://github.com/openservicebrokerapi/servicebroker/blob/master/gettingStarted.md either. I don't think the spec is clear. I don't think it makes great sense from an API extensibility perspective for all brokers to unilaterally to deprovision upon a provision failure, and I don't think it's very pragmatic at this point to start to make them do it.
@jim-minter to the best of my knowledge we don't do any active orphan mitigation. If a provision request fails we return an appropriate error response (or a 500 if it's catastrophic) back to the service-catalog.
It was my understanding that the service-catalog would call deprovision on a failed attempt just to ensure that there were no lingering items left in the broker.
https://github.com/openservicebrokerapi/servicebroker/blob/master/spec.md#orphans
To mitigate orphan service instances and bindings, the marketplace SHOULD attempt to delete resources it cannot be sure were successfully created, and SHOULD keep trying to delete them until the service broker responds with a success.
And in terms of a failed provision, I'd assume this would be the case:
If the platform marketplace encounters an internal error provisioning a service instance or binding (for example, saving to the database fails), then it MUST at least send a single delete or unbind request to the service broker to prevent creation of an orphan.
So, there is a gap in the spec when it comes to what the platform should do and what the broker should expect in this situation. We're working on resolving it - see https://github.com/openservicebrokerapi/servicebroker/issues/348.
This should be fixed now that #17075 has merged. @ncameronbritt can you reopen this if you still see this behavior?
I've also verified the fix. @ncameronbritt You'll need to pull the latest openshift/origin-service-catalog image when you cluster up to test.
Error deprovisioning ServiceInstance "crash-2/mysql-ephemeral-df1hg" of ServiceClass "mysql-ephemeral" at ServiceBroker "template-service-broker": ""
Service instance yamls: while provisioning: https://gist.github.com/ncameronbritt/64ea769cc377fb83a76bc14c06e97b9f
after failure: https://gist.github.com/ncameronbritt/3d94410f8b546b0f3e8d9edf817ab4a3
after deleting: https://gist.github.com/ncameronbritt/2a125939ab8da92e0d4bad6a5b3cfd79
Controller manager logs: https://gist.github.com/ncameronbritt/64e230dddbbc01a93d7b647d5e1ab6c2
@bparees @pmorie @spadgett
Version
oc v3.7.0-alpha.1+3fa1968-489 kubernetes v1.7.0+80709908fd features: Basic-Auth
Server https://127.0.0.1:8443 openshift v3.7.0-alpha.1+50bb01b-594 kubernetes v1.7.0+80709908fd
Steps To Reproduce
Current Result
Service instance is marked for deletion, but does not actually get deleted/removed from the ui.
Expected Result
Failed Service instance is actually deleted.
Additional Information
[try to run
$ oc adm diagnostics
(oroadm diagnostics
) command if possible] [if you are reporting issue related to builds, provide build logs withBUILD_LOGLEVEL=5
] [consider attaching output of the$ oc get all -o json -n <namespace>
command to the issue] [visit https://docs.openshift.org/latest/welcome/index.html]