Warning: I'm opening the issue here, but the problem is actually inside of the k8s-objects library, more specifically we will have to fix k8s-objects-generator and recreate all the objects.
Kubewarden policies written using our Go k8s-objects end up producing a PATCH response that breaks the object.
More specifically, the metadata.deletionTimestamp is added with the zero date. This causes the resulting object to go straight in deletion mode. Moreover, this date seems to cause troubles, since the objects stays in deleting status foreover.
This bug was originally spotted by @enrichman
Expected Behavior
The patched resource should not be left in a broken state. More specifically, the policy should end up patching only the fields that are relevant to it.
Steps To Reproduce
Deploy the kubewarden stack, to reduce the noise ensure no policy is defined.
The problem resides inside of the k8s-objects library, any mutating policy written in Go and making use of it is broken.
Anything else?
This section contains the debugging procedure I used and the root cause of the bug.
Start the Policy Server in debug mode, manually delete the broken nginx pod and create it again. Look at the policy-server logs and find the response provided by the policy. Copy the JSONPatch and decode it by doing echo <copied text> | base64 -d | jq.
As we can see the policy is doing quite some changes to the original object. The only change that "makes sense" is the addition of the new label to /metadata/labels.
The troublesome change is the addition of the /metadata/deletionTimestamp value, with this zero datetime. This what causes the object to immediately enter the termination phase. Plus its odd value is causing the pod to be left stuck into the deletion phase.
Looking at the ObjectMeta struct provided by our library we find the following definition:
CreationTimestamp Time `json:"creationTimestamp,omitempty"`
The problem is that, being this value a struct, the omitempty is not working. Instead we end up creating an instance of the object with its default value (the weird zero datetime).
To prove my theory, I manually patched this file:
diff --git a/apimachinery/pkg/apis/meta/v1/object_meta.go b/apimachinery/pkg/apis/meta/v1/object_meta.go
index cc888a6..a3d3933 100644
--- a/apimachinery/pkg/apis/meta/v1/object_meta.go
+++ b/apimachinery/pkg/apis/meta/v1/object_meta.go
@@ -17,7 +17,7 @@ type ObjectMeta struct {
//
// Populated by the system. Read-only. Null for lists. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
// Format: date-time
- CreationTimestamp Time `json:"creationTimestamp,omitempty"`
+ CreationTimestamp *Time `json:"creationTimestamp,omitempty"`
// Number of seconds allowed for this object to gracefully terminate before it will be removed from the system. Only set when deletionTimestamp is also set. May only be shortened. Read-only.
DeletionGracePeriodSeconds int64 `json:"deletionGracePeriodSeconds,omitempty"`
@@ -26,7 +26,7 @@ type ObjectMeta struct {
//
// Populated by the system when a graceful deletion is requested. Read-only. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
// Format: date-time
- DeletionTimestamp Time `json:"deletionTimestamp,omitempty"`
+ DeletionTimestamp *Time `json:"deletionTimestamp,omitempty"`
// Must be empty before the object is deleted from the registry. Each entry is an identifier for the responsible component that will remove the entry from the list. If the deletionTimestamp of the object is non-nil, entries in this list can only be removed. Finalizers may be processed and removed in any order. Order is NOT enforced because it introduces significant risk of stuck finalizers. finalizers is a shared field, any actor with permission can reorder it. If the finalizer list is processed in order, then this can lead to a situation in which the component responsible for the first finalizer in the list is waiting for a signal (field value, external system, or other) produced by a component responsible for a finalizer later in the list, resulting in a deadlock. Without enforced ordering finalizers are free to order amongst themselves and are not vulnerable to ordering changes in the list.
Finalizers []string `json:"finalizers,omitempty"`
Then I checked out the latest version of the namespace-label-propagator-policy, and patched its go.mod to consume the patched k8s-objects I had on my computer:
The new policy is no longer messing with the deletion timestamp. The creation timestamp is removed now (previously it was set to the zero datetime), but this is not a problem because the API server sets this value once ALL the admission controllers are done with the validation/mutation of a resource.
Is there an existing issue for this?
Current Behavior
Kubewarden policies written using our Go k8s-objects end up producing a PATCH response that breaks the object.
More specifically, the
metadata.deletionTimestamp
is added with the zero date. This causes the resulting object to go straight in deletion mode. Moreover, this date seems to cause troubles, since the objects stays in deleting status foreover.This bug was originally spotted by @enrichman
Expected Behavior
The patched resource should not be left in a broken state. More specifically, the policy should end up patching only the fields that are relevant to it.
Steps To Reproduce
Deploy the kubewarden stack, to reduce the noise ensure no policy is defined.
Add a label to the
default
namespace:Deploy the namespace-label-propagator-policy, which happens to be written in Go and uses our k8s objects.
This policy will patch the
Pod
objects defined inside of thedefault
namespace.Now create a simple pod:
The Pod will be successfully created and patched by the policy:
However, the Pod will be left in a broken state:
Using the
kubectl describe
command, the following event is shown:Environment
Anything else?
This section contains the debugging procedure I used and the root cause of the bug.
Start the Policy Server in
debug
mode, manually delete the brokennginx
pod and create it again. Look at the policy-server logs and find the response provided by the policy. Copy the JSONPatch and decode it by doingecho <copied text> | base64 -d | jq
.You will obtain something like that:
As we can see the policy is doing quite some changes to the original object. The only change that "makes sense" is the addition of the new label to
/metadata/labels
.The troublesome change is the addition of the
/metadata/deletionTimestamp
value, with this zero datetime. This what causes the object to immediately enter the termination phase. Plus its odd value is causing the pod to be left stuck into the deletion phase.Looking at the
ObjectMeta
struct provided by our library we find the following definition:The problem is that, being this value a struct, the
omitempty
is not working. Instead we end up creating an instance of the object with its default value (the weird zero datetime).To prove my theory, I manually patched this file:
Then I checked out the latest version of the namespace-label-propagator-policy, and patched its
go.mod
to consume the patchedk8s-objects
I had on my computer:I then rebuilt the policy and pushed it to a private registry.
I then created a new namespace called
foo
and added a label to it:I then created a new
ClusterAdmissionPolicy
that was scoped to thefoo
namespace and consumed this patched policy:I then created the same Pod into the
foo
namespace. This time the Pod was not left in a broken state.Looking at the patch produced by the patched policy, we can see the following actions being done:
The new policy is no longer messing with the deletion timestamp. The creation timestamp is removed now (previously it was set to the zero datetime), but this is not a problem because the API server sets this value once ALL the admission controllers are done with the validation/mutation of a resource.