Closed kibbles-n-bytes closed 5 years ago
What about the state where I just want that object out of my k8s resource space and I don't care that the broker says I can't delete it... maybe it is a borrowed resource or something?
@n3wscott In that case, you can remove the resource today by doing a kubectl edit
and removing the Service Catalog finalizer after having done a delete request.
I'm kind of hesitant about doing this for a few reasons:
I don't think any of the use-cases we had for this or motivation made it into this issue. Before we proceed on this, I think we need to understand exactly why we feel like we need to make a fairly significant change to the API and be very crisp about when to use this feature. As an example: if you were just to look at this issue currently, it's not really clear what specific situations this is intended to resolve. Is it when parameters are in a bad state? When an invalid plan is selected?
On a longer term basis, I'm very concerned about adding things to our API that don't behave as you would expect them to from using other parts of kube. IMO we should be moving OSB to be more compatible with kube, not the other way around.
It is fundamentally different than what users will expect from other parts of kube
It's possible to model this declaratively. An existsExternally
bool that defaults to true
would allow the user to declare their intent as to whether the resource should be persisted on the broker, and the controller would reconcile this intent with the state of the world, provisioning/deprovisioning as necessary.
What happens after you set this field? Can you update the plan? Can you change the parameters?
You can, but presumably the deprovision operation would take precedent. And while the resource is marked as !existsExternally
, the spec updates wouldn't try anything on the broker. Reminds me a bit of paused
on Deployments?
What happens if you set this field and then delete the resource while async deprovision is happening?
The API server would set the deletion timestamp, which the controller would notice after async deprovision finishes, and it'll go through the normal delete flow which will just remove the finalizer, just like the case where we delete an unprovisioned instance.
How will users tell that an instance has been deprovisioned but not deleted?
There would be no ExternalProperties
, and our Provisioned
value in our status would be NotProvisioned
.
Will users be able to create new bindings to deprovisioned instances?
They could create the k8s resource, but our controller wouldn't create them on the broker because the instance wouldn't be in the Ready state.
Will you be able to create a new ServiceInstance with this field set?
Sure; the controller just wouldn't provision it and it'd act like a paused Deployment.
We will still have to support the exact same behavior as we do currently, so you will still be able to trigger a deprovision by deleting the instance. How will we tell users which option to choose?
If the k8s user wants the option to still use the instance after having requested a deprovision on the offchance they either didn't have permission to or want to revive the resource later on the broker-side, they should set !existsExternally
. If they are sure they no longer want the k8s resource, they should request a delete, which they understand to be 100% fatal to the resource.
As for motivations, @duglin could you describe your situation here to give some context?
I think @kibbles-n-bytes said it well in his opening comment:
As deletion of k8s resources is permanent once it has been accepted by the API server, we currently get into a strange state where if a deprovision/unbind request fails at the broker, our k8s resource will be stuck permanently in the failed deletion state.
This is to deal with cases where the Broker rejects the delete. At some point we need to allow the system to get back into a pre-delete-request state but Kube won't let us do that if we start down the true delete() path.
I agree with @pmorie that this is a strange feature. If Broker can't delete the resource, or got stuck, it's broker's problem. There is no way back, and there is no guarantee that you can just pretend that you never tried to delete resources - Broker might have already deleted some data.
Also, what about cascade deletion? We use ownerReference
to build a dependency graph. When user sends a delete request with foreground deletion (will be default for kubectl in the next release, see https://github.com/kubernetes/kubernetes/pull/59851 , GC will start with the leaves and then to the parent.
How will it work with delete field? I'm concerned that we go against the Kubernetes nature there. Kubernetes doesn't support "undeleting" for a reason, and such generic service as Service Catalog cannot guarantee that "undeleting" will always be possible and safe.
I can think of safe "undelete" under certain conditions:
If we limit the scope for delete flag like this, it could be tolerable. But I'm still concerned that we go in the non-kube direction, as @pmorie already said.
This is to deal with cases where the Broker rejects the delete. At some point we need to allow the system to get back into a pre-delete-request state but Kube won't let us do that if we start down the true delete() path.
@duglin Could you elaborate what "rejects" means? Is it because of some sort of authorization? If so, the problem is that that authorization happens after the sate, desired by the user, has been persisted in the api server. It needs to happen BEFORE state is persisted. Kubernetes provides a mechanism to do that - admission hooks. What is the reason it cannot be used?
The "Authn -> Authz -> State is persisted -> Authz" flow just does not work. I expect more issues with this approach. Spec locking, "imperative deprovision" - this is not how other resources work in Kubernetes. Please don't do this to us, Service Catalog users. We want consistent behaviour across all Kuberentes resources.
Another issues with this "get back to good state after a delete" is that AFAIK OSB spec does not guarantee that an instance is in a valid state after a failed delete. Basically this "imperative deprovision" cannot be used with an arbitrary broker without modifying the OSB spec. Am I mistaken here?
Have you discussed this issue with someone from SIG-Architecture? I strongly recommend that.
https://github.com/kubernetes-incubator/service-catalog/issues/1942#issuecomment-391504929
If Broker can't delete the resource, or got stuck, it's broker's problem. There is no way back, and there is no guarantee that you can just pretend that you never tried to delete resources - Broker might have already deleted some data.
In this case the broker won't have done anything. Kube is stuck in deleting even though it should not be deleted.
GC
Should not be in effect here until the controller issues the actual DELETE, at which point everything should occur as normal.
https://github.com/kubernetes-incubator/service-catalog/issues/1942#issuecomment-391510865
Conditions
I will have to think about these. Potential.
https://github.com/kubernetes-incubator/service-catalog/issues/1942#issuecomment-391538651
what "rejects" means? Is it because of some sort of authorization?
Yes. Authorization to access the broker.
admission hooks. What is the reason it cannot be used?
I proposed this at the F2F and was shot down in favor of this direction. The main problem here is that there is no preflight check to OSBAPI. The admission controller would have to actually do all of the deletion actions, and then do persistence.
"Authn -> Authz -> State is persisted -> Authz" flow just does not work.
I think this hits at the very core of the issue. Unfortunately, once the state of DELETE is persisted, it cannot be un-persisted. Thus, we must find a workaround.
AFAIK OSB spec does not guarantee that an instance is in a valid state after a failed delete. Basically this "imperative deprovision" cannot be used with an arbitrary broker without modifying the OSB spec. Am I mistaken here?
Reading https://github.com/openservicebrokerapi/servicebroker/blob/v2.13/spec.md#originating-identity
I am unsure if there is a specific HTTP code to use as a result of this failure.
Reading https://github.com/openservicebrokerapi/servicebroker/blob/v2.13/spec.md#authentication
Maybe 401
Reading https://github.com/openservicebrokerapi/servicebroker/blob/v2.13/spec.md#unbinding
Maybe 400
Reading https://github.com/openservicebrokerapi/servicebroker/blob/v2.13/spec.md#deprovisioning
Maybe 400
Have you discussed this issue with someone from SIG-Architecture? I strongly recommend that.
No.
Maybe 400 (deprovisioning)
Can you think of any real situation where this might happen, assuming that Service Catalog has no bugs, and OSB broker is implemented correctly according to the spec? Service Catalog just sends instance_id
(along with service_id
and plan_id
), what can possibly be incorrect in this request? If a deprovision request fails with 400
, to me it's a sign of either a bug in OSB broker, or some unknown state of the underlying resource. And in that case getting stuck in terminal deletion state is the best you can do - user can't fix this.
403 Forbidden
is a more adequate rejected result for deprovision request (and the real reason for this change, based on our previous discussions). Interestingly enough, 403
is not among explicitly allowed response codes for OSB deprovision request.
The spec just vaguely says
Responses with any other status code will be interpreted as a failure and the service instance will remain in the marketplace database.
OK, let's assume that 403
can be returned by any broker. What do we do in that case? Does the ServiceInstance get rolled back to the latest healthy state? Will user be able to update the spec and continue using it, as if the instance was never attempted to delete?
And what about requests for which you get 201 Accepted
and then get status: failed
in last_operation
polling result? Will you just retry deleting as currently with deletionTimestamp
set?
https://github.com/kubernetes-incubator/service-catalog/issues/1942#issuecomment-391717768
What do we do in that case? Does the ServiceInstance get rolled back to the latest healthy state? Will user be able to update the spec and continue using it, as if the instance was never attempted to delete?
Yes, I think this is the concept. The appropriate user should barely know that someone unauthorized was poking at their stuff.
And what about requests for which you get 201 Accepted and then get status: failed in last_operation polling result? Will you just retry deleting as currently with deletionTimestamp set?
I do not understand where this case is coming from.
The appropriate user should barely know that someone unauthorized was poking at their stuff.
This is not how Kubernetes works. Once the spec is updated ("desired state"), it stays the same and Kubernetes never rolls it back, even if it fails to make it happen (make "actual state" match the "desired state") after many retries. It just gets stuck with the failed status.
Kubernetes can only reject the delete request on the fly, before it persists the changes - during authz and other validation, e.g. via admission controllers.
I do not understand where this case is coming from.
I am just trying to understand the scope of this change. From the issue description:
At the April F2F we decided to add deprovision/unbind flags to allow the user to attempt a deprovision/unbind without having to delete the Service Catalog k8s resource.
I wasn't at the F2F and there are no implementation/behavior details. Looks like there is actually no agreement on that yet? The design spec in #2037 is very different from what the issue says.
@kibbles-n-bytes after reading your comments about declarative existsExternally
flag I agree that it's more tolerable than what #2037 says with deletionRequested
flag and controller automatically deleting instance after successful deprovisioning. I would prefer to have an opposite flag though - that is false
by default. Something like the paused
deployment flag you mentioned. This would make it clear that this is an opt-in flag that you don't have to use and pay attention to.
And while the resource is marked as !existsExternally, the spec updates wouldn't try anything on the broker.
@kibbles-n-bytes and what happens if the user sets existsExternally
back to true? Will the instance be provisioned again after deprovisioning successfully finished? I presume the answer is "yes". I suspect our code will get a lot more complicated than it already is.
I would prefer to have an opposite flag though - that is false by default.
Another argument for the flag being false
by default is backward compatibility. You cannot bring a new flag that is true
by default - all the existing instances don't have this field.
Another thing to think about: even after adding a new flag user can still issue kubectl delete serviceinstance ...
and you'll end up in the same situation as now.
To mitigate this you would have to implement an admission controller that rejects DELETE
request if the ServiceInstance
hasn't reconciled yet or ProvisionStatus != NotProvisioned
or DeprovisionStatus != NotRequired
, i.e. only allow deletion after successful deprovisioning. Otherwise there is still a chance for an instance to get stuck forever because of 403
.
P.S. If we agree to make such change, this whole hackery must be opt-in of course under a feature flag. I don't think the majority of Service Catalog users actually need any of that.
Kubernetes can only reject the delete request on the fly, before it persists the changes - during authz and other validation, e.g. via admission controllers.
It is impossible for kube to know that it shouldn't allow the delete to be persisted before contacting the broker and attempting a delete.
Another thing to think about: even after adding a new flag user can still issue kubectl delete serviceinstance ... and you'll end up in the same situation as now.
We can prevent this with RBAC rules for the actual user. This is as yet unimplemented and should be implemented by https://github.com/kubernetes-incubator/service-catalog/issues/2061
To mitigate this you would have to implement an admission controller that rejects DELETE request if the ServiceInstance hasn't reconciled yet or ProvisionStatus != NotProvisioned or DeprovisionStatus != NotRequired, i.e. only allow deletion after successful deprovisioning. Otherwise there is still a chance for an instance to get stuck forever because of 403.
Yes. I made this suggestion during the F2F, among others. The delete flag was proposed by someone else. I had thought of it, but did not thing of brining it up as it's so outside the bounds of kube behavior. This extra flag set on update was decided during the meeting as the most appropriate response.
P.S. If we agree to make such change, this whole hackery must be opt-in of course under a feature flag. I don't think the majority of Service Catalog users actually need any of that.
I agree.
We can prevent this with RBAC rules for the actual user. This is as yet unimplemented and should be implemented by #2061
No you can't if the design is how @kibbles-n-bytes described it (that is different from how it's written in #2037) - user still needs to manually delete ServiceInstance
after deprovisioning succeeded. As I said above, I prefer @kibbles-n-bytes's design with the declarative flag that controls provisioning/deprovisioning separately from creation/deletion (i.e. user can set and unset the flag at any time, any number of times), it's less hacky than deletionRequested
and more flexible.
I would like to discuss this issue at the next SIG call (June 4) and not merge anything until we reach the consensus on design.
P.S. If we agree to make such change, this whole hackery must be opt-in of course under a feature flag. I don't think the majority of Service Catalog users actually need any of that.
I agree.
If we solve this via some new flag then we don't need to do anything special, like set a feature flag. If people want to use the normal delete flow then they will and it should just work. If they want to use the flag, then they will, and it'll just work. The opt-in is the explicit use of one of those two mechanisms. However, blocking the normal kube delete is something that I agree will need some kind of trigger. Whether that's done on a global, broker, instance or binding level is still TBD I guess but I agree that should be opt-in until the time we decide it should be the default flow for deprovision/unbind.
https://github.com/kubernetes-incubator/service-catalog/issues/1942#issuecomment-391914626 Does not apimachinery defaults or versioning handle this?
@mhbauer no, defaults apply only to new objects. Versioning - yes, if you support multiple versions, i.e. you will have to declare v2
and duplicate all the types. Currently we only support one version at a time and keep it backward compatible.
v1beta2?
We cannot possibly be the first person to want to add a field with a non go default state.
On 5/25/18 15:43:47, Nail Islamov wrote:
@MHBauer https://github.com/MHBauer no, defaults apply only to new objects. Versioning - yes, if you support multiple versions, i.e. you will have to declare |v2| and duplicate all the types. Currently we only support one version at a time and keep it backward compatible.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes-incubator/service-catalog/issues/1942#issuecomment-392205999, or mute the thread https://github.com/notifications/unsubscribe-auth/AMuDAeKo06ej4Wqn2HvZvxJz3Kb6Kif4ks5t2IkjgaJpZM4TXMDK.
Yes, simultaneously supporting both v1beta1
and v1beta2
for a long time (enough time for all users to migrate to v1beta2
) would work. Which also means having a lot of extra boilerplate. Do we want this? I doubt that.
We can also make the field nullable (*bool
) and treat nil
as true
, but that's ugly.
The flag that is false
by default can be called disabled
probably - with semantics that disabled instance shouldn't exist externally.
@kibbles-n-bytes what happens to existing bindings if we mark an instance with existsExternally: false
? Currently deprovisioning gets blocked by existing bindings. Do you also want to add existsExternally
to bindings? Can you proceed with deprovisioning if binding exists but is marked with existsExternally: false
?
Our code could get very messy and entangled with such corner cases.
https://github.com/kubernetes-incubator/service-catalog/issues/1942#issuecomment-392232135
Do you also want to add existsExternally to bindings?
Both Instances and Bindings need this behavior, so both should get the new field.
AFAIK bindings still lack the retry loop that I added to instances, i.e. reconciliation loop gets stuck after any error HTTP status received from the broker. This should be fixed first before adding this behavior IMO.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close
@fejta-bot: Closing this issue.
As deletion of k8s resources is permanent once it has been accepted by the API server, we currently get into a strange state where if a deprovision/unbind request fails at the broker, our k8s resource will be stuck permanently in the failed deletion state.
At the April F2F we decided to add deprovision/unbind flags to allow the user to attempt a deprovision/unbind without having to delete the Service Catalog k8s resource.