operator-framework / java-operator-sdk

Java SDK for building Kubernetes Operators
https://javaoperatorsdk.io/
Apache License 2.0
786 stars 212 forks source link

CPU resource unit comparison error #2509

Open 10000-ki opened 3 weeks ago

10000-ki commented 3 weeks ago

Bug Report

In Kubernetes, some resource units are automatically converted to a more convenient format internally. For example:

resources:
    limits:
      cpu: 2000m

Even if the CPU is set to 2000m, when you retrieve this configuration using kubectl, you might see:

resources:
    limits:
      cpu: 2

This indicates that the unit has been converted.

The problem arises when an Operator sets the CPU unit as 2000m, but the Kubernetes API server keeps converting it back to 2. This leads to a situation where, during the DependentResource.match process, the Operator and the API server identify the resources as different, causing match to return false.

스크린샷 2024-08-21 오후 5 07 14

The reason for this issue is that when comparing the actual and desired states, the values are checked through simple string comparison.

As a result, the operator keeps sending patch requests, causing the resourceVersion to increase infinitely.

Environment

$ java -version

17

$ kubectl version

v1.24

Possible Solution

Quantity("2000m") == Quantity("2")

In fact, when using Quantity in fabric8 for comparison, it can be determined that these two values are the same. It might be better to use Quantity for resource comparisons. However, given the current structure, this doesn't seem straightforward. Do you think it might be possible?

10000-ki commented 3 weeks ago

This is also same for storage unit comparisons between Gi and Ti. For example: Quantity("1024Gi") == to Quantity("1Ti").

metacosm commented 3 weeks ago

The problem is that the matcher is generic so it'd have to introspect the structure to figure out the type and do the appropriate conversions before checking for equality so it might be difficult to do in a generic way.

metacosm commented 3 weeks ago

One option could be to have specific matchers for specific types to avoid this issue but that would add quite a maintenance burden as we'd need to ensure that these matchers are properly updated whenever the associated type changes in Kubernetes

10000-ki commented 3 weeks ago

@metacosm

One option could be to have specific matchers for specific types to avoid this issue but that would add quite a maintenance burden as we'd need to ensure that these matchers are properly updated whenever the associated type changes in Kubernetes

yes right

While it is possible to compare resources like StatefulSet and Deployment separately, I also anticipate that it would be challenging to handle this across different K8S versions.

I will also take some time to think about whether there might be a better approach.

csviri commented 2 weeks ago

It would be good to handle such cases in the framework, unfortunatelly the current SSA Matcher is not prepared for that. But would be nice to do such experiments, and add such fixes even on-demand bases.

10000-ki commented 1 week ago

It would be good to handle such cases in the framework, unfortunatelly the current SSA Matcher is not prepared for that. But would be nice to do such experiments, and add such fixes even on-demand bases.

yes right so i will try to do some experiments to find if there is a good way.

Donnerbart commented 1 week ago

We are facing the same issue and solved this by actually creating our desired resource in the format that K8s would transform into. With this approach the actual and desired are the same and the built-in matchers work (SSA and also the generic one).

This is also same for storage unit comparisons between Gi and Ti. For example: Quantity("1024Gi") == to Quantity("1Ti").

Good to know, I think I have to build that into our utility class then.

Our use case is a CRUDKubernetesDependentResource<StatefulSet, OurCustomResource> that is created from a StatefulSetSpec that we have in our custom resource. In the desired StatefulSet we go over the containers and modify the resources like this:

        if (container.getResources() != null) {
            convertResourcesQuantity(container.getResources().getRequests(),
                    (requests) -> container.getResources().setRequests(requests));
            convertResourcesQuantity(container.getResources().getLimits(),
                    (limits) -> container.getResources().setLimits(limits));
        }

This is the utility class:

import io.fabric8.kubernetes.api.model.Quantity;
import org.jetbrains.annotations.NotNull;
import org.jetbrains.annotations.Nullable;

import java.util.HashMap;
import java.util.Map;
import java.util.function.Consumer;

public class QuantityUtil {

    private QuantityUtil() {
    }

    public static void convertResourcesQuantity(
            final @Nullable Map<String, Quantity> resource,
            final @NotNull Consumer<Map<String, Quantity>> resourceConsumer) {
        if (resource == null || resource.isEmpty()) {
            return;
        }
        final var newResource = new HashMap<String, Quantity>(resource.size());
        resource.forEach((key, quantity) -> {
            // workaround to deal with the implicit conversion of, e.g. "cpu=1000m" to "cpu=1" or "cpu=0.5" to
            // "cpu=500m" in Fabric8 and K8s, that lead to constant mismatches between the desired and actual
            // values (triggers infinite rolling restarts)
            if ("m".equals(quantity.getFormat())) {
                // a Quantity of, e.g. 512m or 1001m has an intValue() of 0 or 1 (the fractional part is lost), so
                // we can only replace the Quantity if the full numerical amount of the candidate is still the same
                final var numericalAmount = quantity.getNumericalAmount();
                final var candidate = new Quantity(String.valueOf(numericalAmount.intValue()));
                if (numericalAmount.compareTo(candidate.getNumericalAmount()) == 0) {
                    newResource.put(key, candidate);
                    return;
                }
            } else {
                // a Quantity of e.g. 0.5 has a fractional part, so we can only replace the Quantity if the intValue()
                // of the numerical amount doesn't equal the original amount, and the full numerical amount of the
                // candidate is still the same
                final var numericalAmount = quantity.getNumericalAmount();
                if (!quantity.getAmount().equals(String.valueOf(numericalAmount.intValue()))) {
                    final var milliCoreAmount = Math.round(numericalAmount.floatValue() * 1000);
                    final var candidate = new Quantity(String.valueOf(milliCoreAmount), "m");
                    if (numericalAmount.compareTo(candidate.getNumericalAmount()) == 0) {
                        newResource.put(key, candidate);
                        return;
                    }
                }
            }
            newResource.put(key, quantity);
        });
        resourceConsumer.accept(newResource);
    }
}
class QuantityUtilTest {

    private final @NotNull AtomicReference<Map<String, Quantity>> resourcesRef = new AtomicReference<>();
    private final @NotNull Consumer<Map<String, Quantity>> resourceConsumer = resourcesRef::set;

    @Test
    @Timeout(10)
    void convertResourcesQuantity_withNullResource_doNotApplyResources() {
        QuantityUtil.convertResourcesQuantity(null, resourceConsumer);
        assertThat(resourcesRef).hasNullValue();
    }

    @Test
    @Timeout(10)
    void convertResourcesQuantity_withEmptyResource_doNotApplyResources() {
        QuantityUtil.convertResourcesQuantity(Map.of(), resourceConsumer);
        assertThat(resourcesRef).hasNullValue();
    }

    @Test
    @Timeout(10)
    void convertResourcesQuantity_withConvertableMilliFormat_applyConvertedResources() {
        // test Quantity with and without a separate format argument
        final var resource = Map.of("cpu1", new Quantity("1000m"), "cpu2", new Quantity("2000", "m"));
        QuantityUtil.convertResourcesQuantity(resource, resourceConsumer);
        assertThat(resourcesRef).hasValueSatisfying(resources -> assertThat(resources).hasSize(2)
                .containsEntry("cpu1", new Quantity("1"))
                .containsEntry("cpu2", new Quantity("2")));
    }

    @Test
    @Timeout(10)
    void convertResourcesQuantity_withNonConvertableMilliFormat_applyOriginalResources() {
        // test Quantity with and without a separate format argument
        final var resource = Map.of("cpu1", new Quantity("42m"), "cpu2", new Quantity("2342", "m"));
        QuantityUtil.convertResourcesQuantity(resource, resourceConsumer);
        assertThat(resourcesRef).hasValueSatisfying(resources -> assertThat(resources).hasSize(2)
                .containsEntry("cpu1", new Quantity("42m"))
                .containsEntry("cpu2", new Quantity("2342m")));
    }

    @Test
    @Timeout(10)
    void convertResourcesQuantity_withConvertableFormat_applyConvertedResources() {
        final var resource = Map.of("cpu", new Quantity("0.42"));
        QuantityUtil.convertResourcesQuantity(resource, resourceConsumer);
        assertThat(resourcesRef).hasValueSatisfying(resources -> assertThat(resources).hasSize(1)
                .containsEntry("cpu", new Quantity("420m")));
    }

    @Test
    @Timeout(10)
    void convertResourcesQuantity_withNonConvertableFormat_applyOriginalResources() {
        final var resource = Map.of("cpu", new Quantity("0.5"));
        QuantityUtil.convertResourcesQuantity(resource, resourceConsumer);
        assertThat(resourcesRef).hasValueSatisfying(resources -> assertThat(resources).hasSize(1)
                .containsEntry("cpu", new Quantity("0.5")));
    }
}

I know this is far from being perfect, but it solved the infinite update loop issue for us. But of course I'd love to see this fixed in the operator SDK or even the fabric8 client, so there are more eyes on this code (e.g. to cover other missing cases like the storage conversion).

Donnerbart commented 1 week ago

I'm really bad at reading GoLang code. Could someone help me find the code in K8s that does this transformation? Maybe we can re-implement that 1:1 in Java to have the same conversion code as Java utility. This of course might need maintenance, but it would at least be a better workaround than my util, that was just built upon observations.

metacosm commented 1 week ago

Maybe we should add your utility method (or some form of it) on the Quantity class in the client, @Donnerbart?

metacosm commented 1 week ago

I'm really bad at reading GoLang code. Could someone help me find the code in K8s that does this transformation? Maybe we can re-implement that 1:1 in Java to have the same conversion code as Java utility. This of course might need maintenance, but it would at least be a better workaround than my util, that was just built upon observations.

The code seems to be here: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/api/resource/quantity.go

Donnerbart commented 1 week ago

I think that would be a great solution. Maybe something like public Quantity asKubernetesFormat() for the Quantity class and public A asKubernetesFormat() for the QuantityBuilder? Or maybe asKubernetesSerializationFormat()?

(Note that 1024 = 1Ki but 1000 = 1k; I didn't choose the capitalization.)

I love that comment on the K8s code :sweat_smile:

Since I anyway have to improve my utility class, I could make a contribution to the fabric8 client (assuming that I can copy the whole formatting logic from the GoLang code into Java).

metacosm commented 1 week ago

I think that would be a great solution. Maybe something like public Quantity asKubernetesFormat() for the Quantity class and public A asKubernetesFormat() for the QuantityBuilder? Or maybe asKubernetesSerializationFormat()?

Yes. Please open an issue on the client's repo

(Note that 1024 = 1Ki but 1000 = 1k; I didn't choose the capitalization.)

I love that comment on the K8s code 😅

😁

Since I anyway have to improve my utility class, I could make a contribution to the fabric8 client (assuming that I can copy the whole formatting logic from the GoLang code into Java).

That would be ideal and we would appreciate the contribution! 😉

10000-ki commented 5 days ago

https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/api/resource/quantity.go

I also agree with the opinion that the quantity parsing part code is implemented in the same way on the josdk side.