Open q42jaap opened 4 years ago
/sig cli
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-contributor-experience at kubernetes/community. /close
@fejta-bot: Closing this issue.
[root@daocloud ~]# kubectl rollout history deployment/mysql
deployment.apps/mysql
REVISION CHANGE-CAUSE
2 <none>
3 custom message revision 3
5 custom message revision 5
6 custom message revision 6
[root@daocloud ~]# kubectl rollout undo deployment/mysql
deployment.apps/mysql rolled back
[root@daocloud ~]# kubectl rollout history deployment/mysql
deployment.apps/mysql
REVISION CHANGE-CAUSE
2 <none>
3 custom message revision 3
6 custom message revision 6
7 custom message revision 5
I think this is something expected.
When you do rollout undo
, the previous apply is applied again and the change cause is changed to previous one.
Not sure how we can improve the current behavior here.
If you want change the change-cause after rollout undo, you can update it as below
Annotating the Deployment with
kubectl annotate deployment.v1.apps/nginx-deployment kubernetes.io/change-cause="image updated to 1.16.1"
/reopen
@johngmyers: Reopened this issue.
@q42jaap: This issue is currently awaiting triage.
If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
/remove-lifecycle rotten
I also ran into this. A service had changed the names of containers and volumes and that apply had gotten rolled back. Upon rolling forward again, the old containers/volumes were not removed.
Reproduction procedure:
cat >sample.yaml <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-app
labels:
app: test-app
spec:
replicas: 1
selector:
matchLabels:
app: test-app
template:
metadata:
labels:
app: test-app
spec:
containers:
- name: foo
image: gcr.io/google_containers/pause-amd64:3.0
EOF
cat >sample-new.yaml <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-app
labels:
app: test-app
spec:
replicas: 1
selector:
matchLabels:
app: test-app
template:
metadata:
labels:
app: test-app
spec:
containers:
- name: bar
image: gcr.io/google_containers/pause-amd64:3.0
EOF
kubectl apply -f sample.yaml
kubectl apply -f sample-new.yaml
kubectl rollout undo deployment/test-app
kubectl apply -f sample-new.yaml
Expected result: Deployment containing one container named bar
.
Actual result: Deployment containing two containers: foo
and bar
.
@pacoxu your comment does not address the issue as described. This is about kubectl.kubernetes.io/last-applied-configuration
, not kubernetes.io/change-cause
.
/kind bug /remove-kind feature
I prefer that it is something expected.
kubectl.kubernetes.io/last-applied-configuration
should be added when the user execs kubectl apply
.
kubectl rollout
may not change the label.
/cc @eddiezane
In the current behavior, after a kubectl rollout undo
the kubectl.kubernetes.io/last-applied-configuration
refers to a configuration that is no longer the last applied.
As I have demonstrated, the current behavior causes subsequent kubectl apply
commands to sometimes put the resource into an unexpected state.
This is the cause of this issue https://github.com/kubernetes/kubernetes/issues/77058
We discussed this at last week's SIG meeting and will follow up again next week after taking some time to reflect. You can listen to the conversation starting at https://youtu.be/Wh6eU1AfHBY?t=294. Notes follow.
This is happening because the implementation of kubectl rollout undo
essentially copies the spec of the targeted ReplicaSet into the Deployment, without touching the last-applied annotation. That means last-applied continues to reflect the spec from the apply operation that created the newer ReplicaSet we moved off of, not the older one whose spec is now back in the Deployment. Because of this, the strategic merge calculation performed by the next kubectl apply
will be making an illogical and potentially invalid comparison. In the case where the set of fields under management differ among the three points in time in question, apply
may draw conclusions about field ownership that are perfectly correct given the inputs, but incorrect given the real history / actual user intent.
In @johngmyers example, when the rollback happens, the Deployment ends up in a state where the spec has container "foo", and the last-applied annotation has container "bar". When apply
subsequently runs with container "bar" added again, it sees only "bar" in the annotation. It therefore assumes that a different actor added "foo" and it should be left alone.
This is impossible to fix with code alone: we'd need to start keeping more historical data than we do today. Specifically, we'd need to preserve the last-applied information by copying it to ReplicaSet. It is deliberately not being copied over today, because doing so would cause apply
to think it manages the ReplicaSet directly (it doesn't). However, we could copy the data over into a different annotation, and then perform the reverse translation when copying back during a rollback. Something similar would need to be done for the newer metadata.managedFields
used by server-side apply as well. Overall, it would be a significant increase in the size of all ReplicaSet objects created from Deployments managed by apply
, for all users. We need to carefully consider whether this bug fix is worth that non-trivial cost.
SIG-CLI often faces challenges related to the interaction between the imperative and declarative commands kubectl provides. For example, making changes with kubectl edit
between apply
operations is capable of causing a very similar problem. My personal opinion is that rollout undo
falls in a grey area between the two types of command. The user's intent in this case isn't to make a manual change; it's to roll back the entire Deployment object to its previous state. Where the apply
data is stored is an implementation detail they may have no awareness of, and there's nothing that signals to them that the rollback will be incomplete in an important way. Also unlike the crud imperative commands, rollout
does not strike me as something we can call intended for experimental rather than production use. On the contrary, it sounds like something you should be able to reach for in an emergency.
Points to reflect on and come back to at next week's meeting:
cc @seans3 @eddiezane @ant31
One more note: the core k8s docs promote rollout undo
as the way to roll back changes to a Deployment or DaemonSet, no caveats provided. In fact, the DaemonSet version says:
kubectl rollout undo is equivalent to updating DaemonSet template to a previous revision through other commands, such as kubectl edit or kubectl apply.
cc @janetkuo for your thoughts as the original author of this feature
kubectl rollout undo
is an imperative command, similar to kubectl edit
, and not intended to be used with kubectl apply
in its original design. For kubectl apply
users, modifying configs (e.g. revert changes) and then apply changes through kubectl apply
is the declarative way to do a rollback.
As promised, we discussed this again at today's SIG-CLI meeting. You can watch the conversation starting at https://youtu.be/l2plzJ9MRlk?t=2517.
The conclusion was that the distinction between imperative and declarative workflows is important and justifies the related stance that this command is behaving as expected and should not be changed. That said, we're not doing a good enough job of communicating that distinction to users and helping them avoid problems with the interaction between the two strategies. Here are the potential action items that came out of the conversation:
rollout undo
docs about the incompatibility with apply
.rollout undo
was configured with apply and emit a warning, potentially referring to the above docs. We already do something similar when you apply
a resource that was created with create
.If kubectl rollout undo
is not intended to work with kubectl apply
then it should remove kubectl apply
's annotation, not leave it in an incorrect state.
Would there be appetite for the "expected" behaviour under a different command that could be marked as "safe for use with the declarative workflow"? Perhaps something like kubectl apply rollback
?
How would an imperative workflow configure a resource that could then safely be targeted by rollout undo
, if not with apply
?
If kubectl rollout undo is not intended to work with kubectl apply then it should remove kubectl apply's annotation, not leave it in an incorrect state.
Unfortunately that would exacerbate the problem for the users this affects, since the result would be apply
not owning any of the fields it previously did as of the rollback.
Would there be appetite for the "expected" behaviour under a different command that could be marked as "safe for use with the declarative workflow"? Perhaps something like kubectl apply rollback?
No, because the declarative way to roll back is to re-apply the previous state, as @soltysh reiterated in the meetings.
How would an imperative workflow configure a resource that could then safely be targeted by rollout undo, if not with apply?
Apply by definition is not imperative. If you choose imperative configuration management as your strategy (create/edit/patch/replace/set etc.), then this bug report is irrelevant to you. The problem only happens if you are already using apply, then use rollout undo, then use apply again. The docs contrasting these techniques are here: https://kubernetes.io/docs/concepts/overview/working-with-objects/object-management.
No, because the declarative way to roll back is to re-apply the previous state, as @soltysh reiterated in the meetings.
That makes sense if the previous state is readily available. If it isn't available from some other source (e.g. git history or a past helm chart etc) and it is available through the last-applied-configuration
annotation, it feels like an invaluable convenience to be able to rollback in this manner.
That makes sense if the previous state is readily available. If it isn't available from some other source (e.g. git history or a past helm chart etc) and it is available through the last-applied-configuration annotation, it feels like an invaluable convenience to be able to rollback in this manner.
It isn't ever available in the last-applied annotation. That only contains the state that was applied in creating the current version. No information is stored about the state applied to create any previous versions. That's why this incompatibility with apply exists and could not be resolved without a massive increase in storage usage, as explained in my first comment.
Unfortunately that would exacerbate the problem for the users this affects, since the result would be apply not owning any of the fields it previously did as of the rollback.
When a procedure is invalid, having it fail frequently and noticeably is better than having it rarely fail. Procedures that reliably fail teach people to not perform them. Things that rarely fail, in subtle ways, are what lead to disasters.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closing this issue.
I'm going to reopen this so I can transfer it to the kubectl repo.
/reopen
@sftim: Reopened this issue.
/transfer kubectl
/remove-lifecycle rotten /lifecycle stale
/assign @seans3
@seans3 will be adding a warning about a possible conflict when using rollout undo when a last applied annotation is present.
/remove-lifecycle stale /lifecycle frozen
/triage accepted
NOTE: As @eddiezane said, this effort will not be attempting to facilitate the interaction of kubectl rollout
(an imperative command) with kubectl apply
(a declartive command); it will only add a warning message to not mix the two commands.
/retitle kubectl rollout undo should warn about undefined behaviour with kubectl apply
(I think it is mislead to accept this as previously titled; alternatively we can revert the title and the triage and create a separate issue for the warning)
Down the line, it might be a good idea to consider adding a similar warning when you mix any command from different groups. You could for example annotate the deployment with say command-group: declarative
or command-type: imperative
and warn based of of that or something. Or if there's some other metadata somewhere.
With that said, rollout undo
is probably one of the more common and/or important case though so it makes sense to start there.
This issue is now about issuing a specific warning. Feel free to log separate issues for any different improvements folks would like to suggest.
This issue has not been updated in over 1 year, and should be re-triaged.
You can:
/triage accepted
(org members only)/close
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/
/remove-triage accepted
What would you like to be added: Make
kubectl rollout undo
andkubectl apply
work together.kubectl rollout undo
doesn't revert thekubectl.kubernetes.io/last-applied-configuration
annotation, which leads to problems if named objects in arrays are intended to be removed.Why is this needed: I ran into the issue that I had an old container still being present after applying a rename of a sidecar. Symptoms look like these issues: https://github.com/kubernetes/kubernetes/issues/63982 and https://github.com/kubernetes/kubernetes/issues/25390.
What happened to me, is as follows:
a
+b
b
toc
and changec
's implementationkubectl apply
with the changed yaml with a version ofc
which contains an errorkubectl rollout undo
kubectl.kubernetes.io/last-applied-configuration
annotation on the deployment's metadata is not rolled backa
+c
yaml for the deployment,b
is not in the last applied configuration so kubernetes config merge algorithm doesn't removeb
a
+b
+c
The
rollout undo
command doesn't rollbackkubectl.kubernetes.io/last-applied-configuration
which doesn't make sense imho.I reported this to Google Support, with a reproduction, their answer was this:
There are a lot of cases where the old config no long is available to the user, especially since
rollout undo
is only done when there is something going wrong. At these times you wantkubectl
to be your friend who has your back…