kubernetes-sigs / structured-merge-diff

Test cases and implementation for "server-side apply"
Apache License 2.0
97 stars 58 forks source link

Lenient validation and merging for non-strict `listType=map` #234

Closed apelisse closed 11 months ago

apelisse commented 1 year ago

Right now, we have some lists that are marked as listType=map that are not strictly maps in Kubernetes. In that case, everything goes wrong in smd because the list is invalid (and can't be handled), so it's fully rejected. We can't even track the fields, which causes us to fail and return an error which is ignored by Kubernetes, and the managed fields are entirely reset.

Unfortunately, it means that objects that have such a list can't be server-side applied at all. We think it'd be simpler to merely ignore these (fairly rare) occurences and try to do the best thing possibe rather than fail.

This means:

  1. When we validate an object with such a list, we should probably not fail altogether (though we should probably reject a request to server-side apply an object like this).
  2. When tracking the fields, we should give duplicate fields to the same owner.
  3. When server-side applying without the field, we need to be careful about how the field should be removed.
  4. When we merge a list that has such a field, we need to be careful, especially if the server-side apply contains the field (should we remove both other occurences? Probably).
apelisse commented 1 year ago

cc @thockin @jpbetz

jpbetz commented 1 year ago

Do we have the list of fields that fall into this "marked as listType=map but not validated as having unique keys" category?

apelisse commented 1 year ago

Not really but, that's a great question:

  1. I'm pretty sure CRDs don't allow that and never will, so there's no chance people start abusing that mechanism there (IOW, actual validation matches SSA validation)
  2. Since it's only in built-ins, and because a lot of these were not designed for that initially (unfortunately), we have: environment variables and a few containerports.

I considered adding a new, separate tag, but that would require a KEP and make things impossible to backport. I think we need to make it clear that this is a work-around and shouldn't be used for any new list (again, only for built-ins so we have a lot more control over that).

thockin commented 1 year ago

There are two distinct failure modes, the one we have usually run into is client-side apply vs. Service.spec.ports, where the keys are looser than validation, and SSA is the SOLUTION (the keys match validation).

The more recent one is where SSA's keys are more strict than validation. So far we've found:

I admit that I haven't scoured the API for this case, and I don't honestly know how I would in any reasonable amount of time. :( IOW, there's ikely to be more

jpbetz commented 1 year ago

I did a scan. Here's what I found:

A second pair of eyes on this would be really helpful. I only looked at the validation code. I haven't actually tried all these cases.

apelisse commented 1 year ago

I wouldn't be surprised if some people think this will automatically write the validation for them. For most people authoring new resources and using this, it should tbh. Maybe we should generate and have an opt-out flag?

jpbetz commented 1 year ago

Maybe we should generate and have an opt-out flag?

https://github.com/kubernetes/enhancements/pull/3938 proposes that we automatically validate. An opt-out flag sounds useful.

For these 4 ports cases, should we chnage listMapKey to 'name' to match what is validated? Can SSA handle a map key change?

apelisse commented 1 year ago

For these 4 ports cases

Which 4?

If you mean:

Container.Ports - validates unique name, listMapKey=containerPort, protocol

I don't think that's correct, name is unique but I don't know if that's good enough, name isn't required.

jpbetz commented 1 year ago

I don't think that's correct, name is unique but I don't know if that's good enough, name isn't required.

Ah, we don't have a key then, because name is the only field that validation checks when looking for duplicates.

We might need to go with atomic?

apelisse commented 1 year ago

Actually, if we're careful we might be able to make it work with name, especially if it's omitempty, we can default it to "". Then as you mentioned, we would never have two keys with the same name.

apelisse commented 1 year ago

If that works by the way, that might be a great way to fix a lot of these problems. For env, we might be able to make a key with name AND value. Then we can only have one such pair, I think that's fine, we can still merge the list properly.

jpbetz commented 1 year ago

If that works by the way, that might be a great way to fix a lot of these problems. For env, we might be able to make a key with name AND value. Then we can only have one such pair, I think that's fine, we can still merge the list properly.

I've love to see a test suite trying out various operations on a listType=map defined this way. For CEL in CRDs we defined map key to be the CEL expression and it works just fine...

apelisse commented 1 year ago

I've love to see a test suite trying out various operations on a listType=map defined this way.

We have a lot of tests in this repo, what do you have in mind? We already have multi-key tests here, that should work fine. We can reason and see what happens (and maybe fix) when we change the keys, I think we had thought about this a while back.

jpbetz commented 1 year ago

We can reason and see what happens (and maybe fix) when we change the keys, I think we had thought about this a while back.

Good point.

Maybe open a PR that tries to fix the broken cases and we can manually try various apply requests out?

ObjectMeta.OwnerReferences worries me..

thockin commented 1 year ago

Actually, if we're careful we might be able to make it work with name, especially if it's omitempty, we can default it to "". Then as you mentioned, we would never have two keys with the same name.

For ContainerPort, the name is optional. If specified, it must be unique, but many ports may be named "".

thockin commented 1 year ago

Ports: Use enough fields to ensure uniqueness in each case? E.g. all the fields of ContainerPort?

Sadly, we allow total duplicates today, as long as hostPort and hostIP are both unspecified. It's meaningless, but allowed.

thockin commented 1 year ago

PodSpec.TopologySpreadConstraints - validation missing, listMapKey=whenUnsatisfiable

I see topologyKey + whenUnsatisfiable but I find validation code in validateTopologySpreadConstraints()

ServiceSpec.Ports - same as ContainerPorts (except keys are port, protocol)

This is validated in ValidateService()

    // Check for duplicate Ports, considering (protocol,port) pairs
    portsPath = specPath.Child("ports")
    ports := make(map[core.ServicePort]bool)
    for i, port := range service.Spec.Ports {
        portPath := portsPath.Index(i)
        key := core.ServicePort{Protocol: port.Protocol, Port: port.Port}
        _, found := ports[key]
        if found {
            allErrs = append(allErrs, field.Duplicate(portPath, key))
        }    
        ports[key] = true 
    }    

PodSpec.ImagePullSecretes - validation missing, patchMergeKey=name

Agree there's no validation - I think this should be atomic

PodSpec.HostAliases - validation missing, patchMergeKey=ip

Agree there's no validation - I think this should be atomic

ServiceAccount.Secrets - validation missing, patchMergeKey=name

Agree there's no validation - I think this should be atomic

ObjectMeta.OwnerReferences - validation missing, patchMergeKey=uid

Agree this is not validated unique, but it is validated non-empty. This one is hard because I can see people wanting to co-own it.

LabelSelectorRequirement.Key - validation missing, patchMergeKey=key

This field is a string - what does it even mean to have a merge key?

apelisse commented 1 year ago

To be clear, most lists should not be atomic. Atomic should be used for things that can't be a mix of multiple things, or that are very tightly coupled, kind of like a number (wouldn't make sense to let different people change different digits), or a string (we should never let different actors change different parts of a string).

We should try really really hard to not make these lists atomic and give them a good way to be updated by multiple actors with different opinions. Here is a litmus test: if two actors (users and/or controllers) can have different opinions of a list that are not mutually exclusive, then the list shouldn't be atomic.

Can two users want different environment variables to be set? Can two users want different sets of ContainerPorts? Can they each have their secret? Can you have multiple ownerReferences, can you have multiple LabelSelector etc... None of them should be atomic.

Actually, atomic is breaking Ci/CD systems so much that I'd rather find a work-around to make things not completely break than make them atomic.

apelisse commented 1 year ago

LabelSelectorRequirement.Key - validation missing, patchMergeKey=key

This field is a string - what does it even mean to have a merge key?

That has to be a bug 😂

jpbetz commented 1 year ago

For StorageVersion: https://github.com/kubernetes/kubernetes/pull/118520

jpbetz commented 1 year ago

That has to be a bug 😂

It is. It got added by https://github.com/kubernetes/kubernetes/pull/44121 but then got missed when the bad merge keys were cleaned up by https://github.com/kubernetes/kubernetes/pull/50257.

Here's a fix: https://github.com/kubernetes/kubernetes/pull/118522

thockin commented 1 year ago

Antoine, I hear what you are saying but look at these fields. I think most of them are exceedingly unlikely to be co-owned, but I admit some might be.

For some, like ContainerPort, we could add all the fields, if only SSA would not barf on it (but could dedup it). I don't know if that would work on any others.

Env, as a pathological example, is just NOT a map. It's not. It is sequential. We would need some sort of append-merge.

On Tue, Jun 6, 2023, 6:21 PM Joe Betz @.***> wrote:

That has to be a bug 😂

It is. It got added by kubernetes/kubernetes#44121 https://github.com/kubernetes/kubernetes/pull/44121 but then got missed when the bad merge keys were cleaned up by kubernetes/kubernetes#50257 https://github.com/kubernetes/kubernetes/pull/50257.

Here's a fix: kubernetes/kubernetes#118522 https://github.com/kubernetes/kubernetes/pull/118522

— Reply to this email directly, view it on GitHub https://github.com/kubernetes-sigs/structured-merge-diff/issues/234#issuecomment-1579700882, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWAVFZA3MXW6UGXM75LC3XJ7JQVANCNFSM6AAAAAAY3RQOD4 . You are receiving this because you were mentioned.Message ID: @.***>

jpbetz commented 1 year ago

~Fix for container and ephemeralContainer ports: https://github.com/kubernetes/kubernetes/pull/118536~

EDIT: container fix will be more challenging than I originally realized. Identical container ports are allowed unless a name or hostPort is set. Problematic example (which I just verified is supported on a cluster):

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx:1.14.2
    ports:
    - containerPort: 80
    - containerPort: 80
thockin commented 1 year ago

xref: https://github.com/kubernetes/kubernetes/pull/118475

jpbetz commented 1 year ago

I've verified that all remaining cases (env, ports, imagePullSecrets, hostAliases, serviceAccount.Secrets, ownerReferences) allow exact duplicates. This means these none of these cases cannot be fixed solely by changing the merge keys.

This simplifies things in my mind because it means we either have to handle exact duplicates in SSA, or we have to use atomic.

It is interesting that ownerReferences is deprecating duplicates. The warning it returns for duplicates is:

Warning: .metadata.ownerReferences contains duplicate entries; API server dedups owner references in 1.20+, and may reject such requests as early as 1.24; please fix your requests; duplicate UID(s) observed: 421f3acc-5bfd-4007-b28a-4b0749c95888

ports and serviceAccount.secrets have no warnings. We should add those.

Should we also consider if some fields should have a warning that we will stop allowing exact duplicates in the future?

thockin commented 1 year ago

I am skeptical that we can or should ever reject duplicates, but yes to warnings.

jpbetz commented 1 year ago

I am skeptical that we can or should ever reject duplicates, but yes to warnings.

SGTM. When adding the warnings for the fields that do not yet have warnings, we could also update the warning for ownerReferences to stop saying that we might start rejecting requests in 1.24 (since it's over a year ago anyway).

rmohr commented 1 year ago

From the discussion above it is not entirely clear on what you will consider valid on container ports on pods. Will duplicate portnumbers with different port names still be considered valid? They are useful to aggregate pods to different services for different functions without having to expose a different port (requiring app changes).

thockin commented 1 year ago

That depends on the operation. create and replace will preserve pod-ports with the same number but different name. apply will merge them based on port number+protocol (which is how the merge-key is defined)

apelisse commented 11 months ago

also related to https://github.com/kubernetes/kubernetes/issues/113482

apelisse commented 11 months ago

Fixed by #254