quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.65k stars 2.64k forks source link

Kubernetes Client fails to create a new Pod in native executable because `Pod Overhead set without corresponding RuntimeClass defined Overhead.` #39934

Closed scholzj closed 2 months ago

scholzj commented 6 months ago

Describe the bug

I have an application that creates a Pod in Kubernetes cluster. The Pod is very simple, contains only a volume with mounted ConfigMap and a single container. It works fine when I run it with mvn quarkus:dev. But fails after native compilation with the following error:

io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://192.168.1.86:6443/api/v1/namespaces/myproject/pods. Message: pods "kekspose" is forbidden: pod rejected: Pod Overhead set without corresponding RuntimeClass defined Overhead. Received status: Status(apiVersion=v1, code=403, details=StatusDetails(causes=[], group=null, kind=pods, name=kekspose, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=pods "kekspose" is forbidden: pod rejected: Pod Overhead set without corresponding RuntimeClass defined Overhead, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Forbidden, status=Failure, additionalProperties={}).
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:507)
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:524)
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleCreate(OperationSupport.java:340)
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:753)
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:97)
    at io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:42)
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1148)
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:97)
    at cz.scholz.kekspose.Proxy.deployProxy(Proxy.java:81)
    at cz.scholz.kekspose.Kekspose.run(Kekspose.java:67)
    at picocli.CommandLine.executeUserObject(CommandLine.java:2026)
    at picocli.CommandLine.access$1500(CommandLine.java:148)
    at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2453)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2415)
    at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2273)
    at picocli.CommandLine$RunLast.execute(CommandLine.java:2417)
    at io.quarkus.picocli.runtime.PicocliRunner$EventExecutionStrategy.execute(PicocliRunner.java:26)
    at picocli.CommandLine.execute(CommandLine.java:2170)
    at io.quarkus.picocli.runtime.PicocliRunner.run(PicocliRunner.java:40)
    at io.quarkus.runtime.ApplicationLifecycleManager.run(ApplicationLifecycleManager.java:132)
    at io.quarkus.runtime.Quarkus.run(Quarkus.java:71)
    at io.quarkus.runtime.Quarkus.run(Quarkus.java:44)
    at io.quarkus.runner.GeneratedMain.main(Unknown Source)
    at java.base@21.0.2/java.lang.invoke.LambdaForm$DMH/sa346b79c.invokeStaticInit(LambdaForm$DMH)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://192.168.1.86:6443/api/v1/namespaces/myproject/pods. Message: pods "kekspose" is forbidden: pod rejected: Pod Overhead set without corresponding RuntimeClass defined Overhead. Received status: Status(apiVersion=v1, code=403, details=StatusDetails(causes=[], group=null, kind=pods, name=kekspose, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=pods "kekspose" is forbidden: pod rejected: Pod Overhead set without corresponding RuntimeClass defined Overhead, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Forbidden, status=Failure, additionalProperties={}).
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.requestFailure(OperationSupport.java:660)
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.requestFailure(OperationSupport.java:640)
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.assertResponseCode(OperationSupport.java:589)
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.lambda$handleResponse$0(OperationSupport.java:549)
    at java.base@21.0.2/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646)
    at java.base@21.0.2/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
    at java.base@21.0.2/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
    at io.fabric8.kubernetes.client.http.StandardHttpClient.lambda$completeOrCancel$10(StandardHttpClient.java:143)
    at java.base@21.0.2/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
    at java.base@21.0.2/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
    at java.base@21.0.2/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
    at java.base@21.0.2/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
    at io.fabric8.kubernetes.client.http.ByteArrayBodyHandler.onBodyDone(ByteArrayBodyHandler.java:52)
    at java.base@21.0.2/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
    at java.base@21.0.2/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
    at java.base@21.0.2/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
    at java.base@21.0.2/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
    at io.fabric8.kubernetes.client.vertx.VertxHttpRequest.lambda$null$1(VertxHttpRequest.java:122)
    at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:279)
    at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:261)
    at io.vertx.core.http.impl.HttpEventHandler.handleEnd(HttpEventHandler.java:76)
    at io.vertx.core.http.impl.HttpClientResponseImpl.handleEnd(HttpClientResponseImpl.java:250)
    at io.vertx.core.http.impl.Http1xClientConnection$StreamImpl.lambda$new$0(Http1xClientConnection.java:421)
    at io.vertx.core.streams.impl.InboundBuffer.handleEvent(InboundBuffer.java:255)
    at io.vertx.core.streams.impl.InboundBuffer.write(InboundBuffer.java:134)
    at io.vertx.core.http.impl.Http1xClientConnection$StreamImpl.handleEnd(Http1xClientConnection.java:709)
    at io.vertx.core.impl.ContextImpl.execute(ContextImpl.java:313)
    at io.vertx.core.impl.ContextImpl.execute(ContextImpl.java:293)
    at io.vertx.core.http.impl.Http1xClientConnection.handleResponseEnd(Http1xClientConnection.java:940)
    at io.vertx.core.http.impl.Http1xClientConnection.handleHttpMessage(Http1xClientConnection.java:810)
    at io.vertx.core.http.impl.Http1xClientConnection.handleMessage(Http1xClientConnection.java:774)
    at io.vertx.core.net.impl.ConnectionBase.read(ConnectionBase.java:159)
    at io.vertx.core.net.impl.VertxHandler.channelRead(VertxHandler.java:153)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
    at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
    at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
    at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1475)
    at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1338)
    at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1387)
    at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530)
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base@21.0.2/java.lang.Thread.runWith(Thread.java:1596)
    at java.base@21.0.2/java.lang.Thread.run(Thread.java:1583)
    at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:833)
    at org.graalvm.nativeimage.builder/com.oracle.svm.core.posix.thread.PosixPlatformThreads.pthreadStartRoutine(PosixPlatformThreads.java:211)

This seems to be caused because the native executable sets the .spec.overhead fiels to an ampty object ({}) instead of keeping it null. When I explicitely set it to null:

            new PodBuilder()
                // ...
                .withNewSpec()
                    // ...
                    .withOverhead(null)
                .endSpec()
                .build();

It seems to work fine.

Expected behavior

The .spec.overhead field is set to null even without it being done in the PodBuilder and the application is able to create the new Pod even when compiled to a native executable.

Actual behavior

A native executable fails with the error described above.

How to Reproduce?

  1. Write an application that creates a simple Pod object and creates it in Kubernetes cluster with client.pods().inNamespace(namespace).resource(new PodBuilder()...build()).create();
  2. See that it works withotu native compilation
  3. Compile it into native executable and observe that you get the error

The code where I run into this error can be found here: https://github.com/scholzj/kekspose/blob/main/src/main/java/cz/scholz/kekspose/Proxy.java#L81

Output of uname -a or ver

Darwin Jakubs-MBP 23.4.0 Darwin Kernel Version 23.4.0: Fri Mar 15 00:10:42 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T6000 arm64

Output of java -version

openjdk version "21.0.2" 2024-01-16 OpenJDK Runtime Environment GraalVM CE 21.0.2+13.1 (build 21.0.2+13-jvmci-23.1-b30) OpenJDK 64-Bit Server VM GraalVM CE 21.0.2+13.1 (build 21.0.2+13-jvmci-23.1-b30, mixed mode, sharing)

Mandrel or GraalVM version (if different from Java)

No response

Quarkus version or git rev

3.9.2

Build tool (ie. output of mvnw --version or gradlew --version)

Apache Maven 3.8.6 (84538c9988a25aec085021c365c560670ad80f63)

Additional information

No response

quarkus-bot[bot] commented 6 months ago

/cc @Karm (mandrel), @galderz (mandrel), @geoand (kubernetes), @iocanel (kubernetes), @zakkak (mandrel,native-image)

geoand commented 6 months ago

cc @manusa

galderz commented 6 months ago

Hmmm, not sure how this is possible. The only way overhead in PodSpecFluent can be an empty map is by calling one of addToOverhead and then calling a removeFromOverhead. Can you see if something/somewhere is calling addToOverhead when running in native? You can use gdb to try to locate such a caller, or rebuild the fabric8 kubernetes client code to instantiate an exception and print a stacktrace when that method is called.

scholzj commented 6 months ago

You can use gdb to try to locate such a caller, or rebuild the fabric8 kubernetes client code to instantiate an exception and print a stacktrace when that method is called.

I'm afraid I have no idea how to do any of those.

galderz commented 6 months ago

@scholzj I think there's something else going on here. I quickly tried a Quarkus JVM application that invoked this method:

    private Pod pod()   {
        return new PodBuilder()
            .withNewMetadata()
            .withName("boo")
            .withNamespace("boo.namesapce")
            .withLabels(Map.of("app", "kekspose"))
            .endMetadata()
            .withNewSpec()
            .withContainers(new ContainerBuilder()
                .withName("kroxylicious")
                .withImage("proxy.image")
                .withArgs("--config", "/etc/kekspose/proxy-config.yaml")
                .withPorts(List.of())
                .withVolumeMounts(new VolumeMountBuilder().withName("proxy-config").withMountPath("/etc/kekspose/proxy-config.yaml").withSubPath("proxy-config.yaml").build())
                .build())
            .withVolumes(new VolumeBuilder().withName("proxy-config").withNewConfigMap().withName("boo").endConfigMap().build())
            // .withOverhead(null)
            .endSpec()
            .build();
    }

And overhead is {} also in JVM mode. So, no difference there with native.

By the sound of the message Pod Overhead set without corresponding RuntimeClass defined Overhead, maybe some reflection configuration is missing? Maybe looking up RuntimeClass or whatever other class/implementation is talking about is not found? That would explain why things work in JVM mode but not in native.

Can you create a self contained reproducer? Maybe one that doesn't require Kubernetes itself?

mhus commented 5 months ago

Have the same problem here, a simple pod create will add

    "overhead" : {

    },

to the spec.

Code:

apiProvider.getCoreV1Api().createNamespacedPod("default",
                new V1PodBuilder()
                .withNewMetadata()
                .withName("test-pod-" + MDate.toIsoDateTime(System.currentTimeMillis()))
                .endMetadata()
                .withNewSpec()
                .addNewContainer()
                .withName("test-container")
                .withImage("nginx")
                .endContainer()
                .endSpec()
                .build()
        ).execute();

Result:


{
  "spec" : {
    "overhead" : {

    },
    "nodeSelector" : {

    },
    "containers" : [
      {
        "image" : "nginx",
        "name" : "test-container"
      }
    ]
  },
  "metadata" : {
    "annotations" : {

    },
    "labels" : {

    },
    "name" : "test-pod-2024-05-07 20:52:51"
  }
}
20.0.1
mhus commented 5 months ago

Add .withOverhead(null) did the trick and it's working.

The same with objects:

var body = Yaml.loadAs(yaml, V1Pod.class);
if (body.getSpec().getOverhead() != null && body.getSpec().getOverhead().size() == 0) {
    body.getSpec().setOverhead(null);
}
return apiProvider.getCoreV1Api().createNamespacedPod(body.getMetadata().getNamespace(), body).execute();
galderz commented 5 months ago

@mhus same I asked @scholzj, standalone reproducer?

scholzj commented 5 months ago

@galderz This should help: https://github.com/scholzj/quarkus-39934-reproducer (the instructions are in the README)

galderz commented 4 months ago

Thanks @scholzj, I was eventually able to reproduce it. The issue appears to be a pojo to JSON serialization issue:

Looking at hotspot, the Pod instance is serialized into:

{
  "apiVersion": "v1",
  "kind": "Pod",
  "metadata": {
    "name": "nginx",
    "namespace": "myproject"
  },
  "spec": {
    "containers": [
      {
        "image": "nginx:1.14.2",
        "name": "nginx",
        "ports": [
          {
            "containerPort": 80
          }
        ]
      }
    ]
  }
}

As shown by:

2024-05-20 07:57:40,339 TRACE [io.fab.kub.cli.htt.HttpLoggingInterceptor] (vert.x-eventloop-thread-1) > User-Agent: fabric8-kubernetes-client/6.11.0
2024-05-20 07:57:40,339 TRACE [io.fab.kub.cli.htt.HttpLoggingInterceptor] (vert.x-eventloop-thread-1) {"apiVersion":"v1","kind":"Pod","metadata":{"name":"nginx","namespace":"myproject"},"spec":{"containers":[{"image":"nginx:1.14.2","name":"nginx","ports":[{"containerPort":80}]}]}}

However, in native the following JSON is generated:

{
  "apiVersion": "v1",
  "kind": "Pod",
  "metadata": {
    "annotations": {},
    "finalizers": [],
    "labels": {},
    "managedFields": [],
    "name": "nginx",
    "namespace": "myproject",
    "ownerReferences": []
  },
  "spec": {
    "containers": [
      {
        "args": [],
        "command": [],
        "env": [],
        "envFrom": [],
        "image": "nginx:1.14.2",
        "name": "nginx",
        "ports": [
          {
            "containerPort": 80
          }
        ],
        "resizePolicy": [],
        "volumeDevices": [],
        "volumeMounts": []
      }
    ],
    "ephemeralContainers": [],
    "hostAliases": [],
    "imagePullSecrets": [],
    "initContainers": [],
    "nodeSelector": {},
    "overhead": {},
    "readinessGates": [],
    "resourceClaims": [],
    "schedulingGates": [],
    "tolerations": [],
    "topologySpreadConstraints": [],
    "volumes": []
  }
}

E.g.

2024-05-20 08:00:29,794 TRACE [io.fab.kub.cli.htt.HttpLoggingInterceptor] (vert.x-eventloop-thread-1) > User-Agent: fabric8-kubernetes-client/6.11.0
2024-05-20 08:00:29,794 TRACE [io.fab.kub.cli.htt.HttpLoggingInterceptor] (vert.x-eventloop-thread-1) {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"finalizers":[],"labels":{},"managedFields":[],"name":"nginx","namespace":"myproject","ownerReferences":[]},"spec":{"containers":[{"args":[],"command":[],"env":[],"envFrom":[],"image":"nginx:1.14.2","name":"nginx","ports":[{"containerPort":80}],"resizePolicy":[],"volumeDevices":[],"volumeMounts":[]}],"ephemeralContainers":[],"hostAliases":[],"imagePullSecrets":[],"initContainers":[],"nodeSelector":{},"overhead":{},"readinessGates":[],"resourceClaims":[],"schedulingGates":[],"tolerations":[],"topologySpreadConstraints":[],"volumes":[]}}

Not only we can see the overhead being sent and causing issues, but a lot of other data that really is not needed.

This issue is present as far back as 3.7.4. In 3.6.9 it worked fine (others versions in between could be fine). The Fabric8 version in 3.7.4 was 6.10.0 and in 3.6.9 was 6.9.2. I remember there were some significant issues with this update. Anything in the update that can explain this @iocanel @metacosm @manusa @shawkins?

galderz commented 4 months ago

Nulling overhead makes the Fabric8 client not serialize that field into the JSON:

{
  "apiVersion": "v1",
  "kind": "Pod",
  "metadata": {
    "annotations": {},
    "finalizers": [],
    "labels": {},
    "managedFields": [],
    "name": "nginx",
    "namespace": "myproject",
    "ownerReferences": []
  },
  "spec": {
    "containers": [
      {
        "args": [],
        "command": [],
        "env": [],
        "envFrom": [],
        "image": "nginx:1.14.2",
        "name": "nginx",
        "ports": [
          {
            "containerPort": 80
          }
        ],
        "resizePolicy": [],
        "volumeDevices": [],
        "volumeMounts": []
      }
    ],
    "ephemeralContainers": [],
    "hostAliases": [],
    "imagePullSecrets": [],
    "initContainers": [],
    "nodeSelector": {},
    "readinessGates": [],
    "resourceClaims": [],
    "schedulingGates": [],
    "tolerations": [],
    "topologySpreadConstraints": [],
    "volumes": []
  }
}
manusa commented 4 months ago

The Fabric8-generated builders always initialize the List fields. The Fabric8-generated types have empty Maps and empty List as default values for these kind of fields. See:

However, the generated type Map fields include the @JsonInclude(JsonInclude.Include.NON_EMPTY) annotation (https://github.com/fabric8io/kubernetes-client/blob/a3efaa3d773de02f42ca16255d66155b8bc97118/kubernetes-model-generator/kubernetes-model-core/src/generated/java/io/fabric8/kubernetes/api/model/PodSpec.java#L125-L127). This means that empty maps shouldn't be serialized (Just as it behaves in JVM mode).

This has been the standard Fabric8 Kubernetes Client behavior since 6.0.0 (almost 2 years now).

It seems that the annotation is not respected in native mode. However, I'm not sure if this is due to a change in the Fabric8 code (serializers) or something in Quarkus+Jackson.

galderz commented 4 months ago

@manusa I'll try to git bisect it and see if I can get more clues.

galderz commented 4 months ago

I did a bisect on this. The last version where this worked was 3.7.3 and it broke in 3.7.4. Looks like it's an unintentional side effect of https://github.com/quarkusio/quarkus/issues/38683. Commit issue is https://github.com/quarkusio/quarkus/commit/943784b27ed68a07c1f8fa3b9c7bb927a12e2b35:

943784b27ed68a07c1f8fa3b9c7bb927a12e2b35 is the first bad commit
commit 943784b27ed68a07c1f8fa3b9c7bb927a12e2b35
Author: Foivos Zakkak <fzakkak@redhat.com>
Date:   Tue Feb 20 15:28:10 2024 +0200

    Ignore `ValidationSchema` that results in registering all models

    `ValidationSchema` is annotated with `@JsonDeserialize` which leads in
    its entire type hierarchy being registered for reflective access along
    with the corresponding methods. This essentially ends up registering all
    models as in kubernetes-client 6.9.0 ValidationSchema was augmented to
    implement `Editable<ValidationSchemaBuilder>`, which increases the
    reachable types in comparison to previous versions.

    Ignoring registrations for `ValidationSchema` aligns with what we
    already do for `KubeSchema`.

    Fixes https://github.com/quarkusio/quarkus/issues/38683

    (cherry picked from commit 637ebeae5ecc85b33a0baf5d976b0e2a683d83f7)

 .../kubernetes/client/deployment/KubernetesClientProcessor.java        | 3 +++
 1 file changed, 3 insertions(+)

@zakkak What shall we do about this?

galderz commented 4 months ago

@manusa Are there any kubernetes serialization tests in Quarkus that can help detect issues like this, which do not require starting up a Kubernetes clusters? If there are none, can you help craft one?

manusa commented 4 months ago

Commit issue is https://github.com/quarkusio/quarkus/commit/943784b27ed68a07c1f8fa3b9c7bb927a12e2b35:

This is surprising, I'm not sure how this might be affecting the way the objects are serialized.

@manusa Are there any kubernetes serialization tests in Quarkus that can help detect issues like this, which do not require starting up a Kubernetes clusters?

I'm not sure, I'd need to check.

If there are none, can you help craft one?

Sure, I would be happy to.

zakkak commented 4 months ago

@zakkak What shall we do about this?

Now that https://github.com/fabric8io/kubernetes-client/pull/5759 is in (since 6.11.0 (2024-03-25) which Quarkus updated to in https://github.com/quarkusio/quarkus/commit/b89706d693ada6b440ea367e6357d00f7ff7cad5), maybe we could try reverting the change, although I would be surprised if it changed anything (due to https://github.com/fabric8io/kubernetes-client/pull/5759)

jorsol commented 4 months ago

Any update on this? If the fix has been upstream since 6.11.0, and reverting the change fixes the serialization issue in the native image, then it would be great to have this working again.

galderz commented 4 months ago

@jorsol No further updates yet.

I'm intrigued on why the commit mentioned above causes a serialization issue. In the absence of a serialization integration test (cc @manusa), I need to do some further setup to debug this further.

In any case, there's a very simple workaround that can be applied.

jorsol commented 4 months ago

In any case, there's a very simple workaround that can be applied.

Tell me more! πŸ˜‰

galderz commented 4 months ago

In any case, there's a very simple workaround that can be applied.

Tell me more! πŸ˜‰

LOL, it's right in the description:

                    .withOverhead(null)
jorsol commented 4 months ago

In any case, there's a very simple workaround that can be applied.

Tell me more! πŸ˜‰

LOL, it's right in the description:

                    .withOverhead(null)

Ouch, but that is specific to the PodBuilder, and as you are aware many other fields get serialized (https://github.com/quarkusio/quarkus/issues/39934#issuecomment-2119789713), this might affect other things too, not only a pod.

manusa commented 4 months ago

In the absence of a serialization integration test (cc @manusa)

Sorry for the delay, I expect to be able to work and complete this in a few days :sweat:.

manusa commented 3 months ago

I'm intrigued on why the commit mentioned above causes a serialization issue. In the absence of a serialization integration test (cc @manusa), I need to do some further setup to debug this further.

Hi, @galderz

I created #41274 that should ensure this isn't happening either on JVM or Native.

I ran the integration tests (IT) locally in both modes and I wasn't able to reproduce the issue, so maybe I missed something.

galderz commented 3 months ago

I've made some progress with this, but still needs further work:

What I can see is that for some yet unknown reason, the @JsonInclude annotations in say ObjectMeta.labels are not found. That annotation is specified as NON_EMPTY, which makes fields like labels being included unless they're empty. Due to annotation being not found, NON_NULL default value is given instead, so they would only be skipped if null. Since the fields are just "empty" then, they are included in the output.

Strangely, the other annotation in the field, e.g. @JsonProperty for ObjectMeta.labels is indeed found.

The final piece of the puzzle is how all of this relates to https://github.com/quarkusio/quarkus/commit/943784b27ed68a07c1f8fa3b9c7bb927a12e2b35.

galderz commented 3 months ago

One additional important detail that explains the differences in JsonProperty and JsonInclude. Both the native and JVM code locate the getter for the field and extract annotations from the getter method. E.g.

    @JsonProperty("annotations")
    public Map<String, String> getAnnotations() {
        return this.annotations;
    }

The difference is that when JVM mode gets the AnnotatedMethod for that method, it includes both JsonProperty and JsonInclude annotations. It is quite odd that a method that is only annotated with JsonProperty suddenly gets the JsonInclude annotation too. The field however has both:

    @JsonProperty("annotations")
    @JsonInclude(Include.NON_EMPTY)
    private Map<String, String> annotations = new LinkedHashMap();

Maybe there's some union of annotations happening somewhere?

In native though, native also locates the annotated method but it only finds the annotations present in the method itself. The kind of "union" of annotations does seems to happen in JVM mode is not working there.

galderz commented 3 months ago

Maybe there's some union of annotations happening somewhere?

Or maybe this is just a bug in the kubernetes client that has gone unnoticed until now: Why would you have a field and its getter annotated differently?

galderz commented 3 months ago

Maybe there's some union of annotations happening somewhere?

Or maybe this is just a bug in the kubernetes client that has gone unnoticed until now: Why would you have a field and its getter annotated differently?

The reason this seems to work in JVM mode is that jackson takes field and getter methods, and these are meged into an annotation map. Don't know yet why this doesn't work in native, but seems sensible to me to have both fields and getters having the same annotations, or remove either one of field or getter annotations, so that you don't have duplication.

shawkins commented 3 months ago

Why would you have a field and its getter annotated differently?

@galderz I don't think that is intentional. The generation logic has been updated incrementally over the years - as long as things worked in jvm mode it wouldn't have been noticed that the property and the getter have different annotations. Were you able to test this scenario with the pod and related classes modified to confirm this is the problem?

@manusa if this is the cause, I can take a pass over the generation logic and see what can be minimized. Not sure if the java generator is also affected.

manusa commented 3 months ago

I'm still puzzled because the test I implemented does work on Native and JVM. So I'm not sure if this union is non-deterministic or if there's something else going on.

@galderz I don't think that is intentional.

I don't think it is either. The model generation has undergone multiple iterations over the last couple of (now probably 8-9) years and redundant annotations such as these might have come unnoticed since this is generated code and tests continue to pass.

@manusa if this is the cause, I can take a pass over the generation logic and see what can be minimized. Not sure if the java generator is also affected.

Thanks @shawkins, that would be great. We can try to include this in the 6.13.1 release.

manusa commented 3 months ago

I created https://github.com/fabric8io/kubernetes-client/pull/6086 which should address the duplicate annotation issue.

You can check the first commit https://github.com/fabric8io/kubernetes-client/pull/6086/commits/f5c1aa8517d968f7c862f298dd85fd39ca305672 to see the relevant changes.

manusa commented 3 months ago

After a few test failures, it looks like annotating only fields won't work.

Regarding @galderz question

Why would you have a field and its getter annotated differently?

This is common in Jackson, you might want to serialize a field twice using different names/properties.

Or you might want to be able to only deserialize a field (e.g. password in a POJO), so you just annotate the setter.

You can see how this is more or less generalized by checking the original Jackson2Annotator which will provide the annotations for both the field and the getter/setter.

I can try to work more on our annotator to create the same annotations for the method and the field, but that's as far as we can go (I think).

However, I still think that the generated native image has some flaw as compared to its behavior in JVM mode. Or maybe I'm missing something else.

jorsol commented 3 months ago

There is something else going on, this is the minimal reproducer I could get:

//usr/bin/env jbang "$0" "$@" ; exit $?
//DEPS io.quarkus.platform:quarkus-bom:3.12.0@pom
//DEPS io.quarkus:quarkus-kubernetes-client
//JAVAC_OPTIONS -parameters
//JAVA_OPTIONS -Djava.util.logging.manager=org.jboss.logmanager.LogManager

import java.util.Map;

import com.fasterxml.jackson.annotation.JsonInclude;
import com.fasterxml.jackson.annotation.JsonPropertyOrder;
import com.fasterxml.jackson.databind.annotation.JsonDeserialize;

import io.fabric8.kubernetes.api.model.ObjectMeta;
import io.fabric8.kubernetes.api.model.Pod;
import io.fabric8.kubernetes.api.model.PodBuilder;
import io.fabric8.kubernetes.api.model.PodSpec;
import io.fabric8.kubernetes.api.model.PodStatus;
import io.fabric8.kubernetes.client.utils.KubernetesSerialization;
import io.quarkus.runtime.QuarkusApplication;
import io.quarkus.runtime.annotations.QuarkusMain;
import jakarta.inject.Inject;

@QuarkusMain
public class Command implements QuarkusApplication {

  @Inject
  KubernetesSerialization kubeser;

  @Override
  public int run(String... args) throws Exception {
    Pod pod = new PodBuilder()
        .withNewMetadata()
        .withName("nginx")
        .withLabels(Map.of("app", "nginx"))
        .withNamespace("default")
        .endMetadata()
        .build();

    System.out.println(kubeser.asYaml(pod));
    return 0;
  }

  // @JsonDeserialize(using = com.fasterxml.jackson.databind.JsonDeserializer.None.class)
  @JsonInclude(JsonInclude.Include.NON_NULL)
  @JsonPropertyOrder({
      "apiVersion",
      "kind",
      "metadata",
      "spec",
      "status"
  })
  private static final class Demo {
    String apiVersion = "v1";
    String kind = "Demo";
    ObjectMeta metadata;
    PodSpec spec;
    PodStatus status;
  }
}

Note the commented @JsonDeserialize, when commented it does not work, when uncommented it works.

The native-image analysis:

   15,791 reachable types   (80.6% of   19,592 total)
   28,860 reachable fields  (69.7% of   41,431 total)
  115,618 reachable methods (61.5% of  187,979 total)
    6,938 types,   185 fields, and 29,761 methods registered for reflection
       61 types,    61 fields, and    55 methods registered for JNI access

vs

   15,976 reachable types   (81.5% of   19,606 total)
   29,265 reachable fields  (69.9% of   41,843 total)
  120,286 reachable methods (63.5% of  189,483 total)
    7,149 types, 2,016 fields, and 35,552 methods registered for reflection
       61 types,    61 fields, and    55 methods registered for JNI access

So, it remains a mystery.

jorsol commented 3 months ago

Out of curiosity, has anyone tried to revert https://github.com/quarkusio/quarkus/pull/38886?

If this PR https://github.com/fabric8io/kubernetes-client/pull/5759 has already been released then it should not be needed and can be reverted, with a bit of luck this is the root issue.

Update: Nevermind looks like this is not the root issue.

galderz commented 3 months ago

So, it remains a mystery.

Yeah it does. So far I only see side effects but I don't yet have the exact cause yet. More to come

galderz commented 3 months ago

Out of curiosity, has anyone tried to revert #38886?

No. I think we need to understand the exact link between #38886 and this issue before thinking about the possibility of reverting it. Currently my theory is that due to reflection registration being missing, things have got out of shape, but bringing reflection registration back in full goes back to the issues that #38886 was trying to solve. So you really need to understand all the pieces before reverting or bringing anything back.

jorsol commented 3 months ago

This might be fixed with https://github.com/fabric8io/kubernetes-client/pull/6101 but that would require a new release of Kubernetes Client.

galderz commented 2 months ago

This might be fixed with fabric8io/kubernetes-client#6101 but that would require a new release of Kubernetes Client.

That would be fixing a symptom but as hinted above, there is an underlying issue here that goes beyond this.

The link between https://github.com/quarkusio/quarkus/pull/38886 and this particular issue is that it causes field reflection for some (or all?) fabric8 types to stop working.

Jackson's databind annotated field/method logic relies on reflection to discover annotated fields and methods (NOTE: I don't think this should really be needed, Jandex has all the annotated field/method information without having to rely on reflection @dmlloyd et al?). E.g.

https://github.com/FasterXML/jackson-databind/blob/jackson-databind-2.17.2/src/main/java/com/fasterxml/jackson/databind/introspect/AnnotatedFieldCollector.java

private Map<String,FieldBuilder> _findFields(...) {
    for (Field f : cls.getDeclaredFields()) { // <- reflection alert!
...

In the above snippet, the declared fields is empty from 3.7.4 onwards and this causes the issue here. Since no field annotations are found, only what is annotated in methods is discovered, which is why JsonInclude is not found, because it's not in the method.

This class reproduces the issue by looking up the declared fields and methods of a FQN passed in as parameter. If you pass in io.fabric8.kubernetes.api.model.ObjectMeta with 3.7.3 it prints:

Fields size: 16
Fields: [private java.util.Map io.fabric8.kubernetes.api.model.ObjectMeta.annotations, private java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.creationTimestamp, private java.lang.Long io.fabric8.kubernetes.api.model.ObjectMeta.deletionGracePeriodSeconds, private java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.deletionTimestamp, private java.util.List io.fabric8.kubernetes.api.model.ObjectMeta.finalizers, private java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.generateName, private java.lang.Long io.fabric8.kubernetes.api.model.ObjectMeta.generation, private java.util.Map io.fabric8.kubernetes.api.model.ObjectMeta.labels, private java.util.List io.fabric8.kubernetes.api.model.ObjectMeta.managedFields, private java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.name, private java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.namespace, private java.util.List io.fabric8.kubernetes.api.model.ObjectMeta.ownerReferences, private java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.resourceVersion, private java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.selfLink, private java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.uid, private java.util.Map io.fabric8.kubernetes.api.model.ObjectMeta.additionalProperties]
Methods size: 40
Methods: [public java.util.Map io.fabric8.kubernetes.api.model.ObjectMeta.getLabels(), public java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.getName(), public boolean io.fabric8.kubernetes.api.model.ObjectMeta.equals(java.lang.Object), public java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.toString(), public int io.fabric8.kubernetes.api.model.ObjectMeta.hashCode(), public java.util.Map io.fabric8.kubernetes.api.model.ObjectMeta.getAnnotations(), public void io.fabric8.kubernetes.api.model.ObjectMeta.setName(java.lang.String), public void io.fabric8.kubernetes.api.model.ObjectMeta.setAnnotations(java.util.Map), public java.util.List io.fabric8.kubernetes.api.model.ObjectMeta.getFinalizers(), public void io.fabric8.kubernetes.api.model.ObjectMeta.setFinalizers(java.util.List), public java.lang.Long io.fabric8.kubernetes.api.model.ObjectMeta.getGeneration(), public void io.fabric8.kubernetes.api.model.ObjectMeta.setGeneration(java.lang.Long), public void io.fabric8.kubernetes.api.model.ObjectMeta.setLabels(java.util.Map), public java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.getSelfLink(), public void io.fabric8.kubernetes.api.model.ObjectMeta.setSelfLink(java.lang.String), public io.fabric8.kubernetes.api.model.ObjectMetaBuilder io.fabric8.kubernetes.api.model.ObjectMeta.toBuilder(), protected boolean io.fabric8.kubernetes.api.model.ObjectMeta.canEqual(java.lang.Object), public java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.getNamespace(), public void io.fabric8.kubernetes.api.model.ObjectMeta.setNamespace(java.lang.String), public io.fabric8.kubernetes.api.model.ObjectMetaBuilder io.fabric8.kubernetes.api.model.ObjectMeta.edit(), public java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.getUid(), public void io.fabric8.kubernetes.api.model.ObjectMeta.setUid(java.lang.String), public java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.getCreationTimestamp(), public void io.fabric8.kubernetes.api.model.ObjectMeta.setCreationTimestamp(java.lang.String), public java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.getDeletionTimestamp(), public void io.fabric8.kubernetes.api.model.ObjectMeta.setDeletionTimestamp(java.lang.String), public java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.getGenerateName(), public void io.fabric8.kubernetes.api.model.ObjectMeta.setGenerateName(java.lang.String), public java.util.List io.fabric8.kubernetes.api.model.ObjectMeta.getManagedFields(), public void io.fabric8.kubernetes.api.model.ObjectMeta.setManagedFields(java.util.List), public java.util.List io.fabric8.kubernetes.api.model.ObjectMeta.getOwnerReferences(), public void io.fabric8.kubernetes.api.model.ObjectMeta.setOwnerReferences(java.util.List), public java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.getResourceVersion(), public void io.fabric8.kubernetes.api.model.ObjectMeta.setResourceVersion(java.lang.String), public void io.fabric8.kubernetes.api.model.ObjectMeta.setAdditionalProperty(java.lang.String,java.lang.Object), public java.lang.Long io.fabric8.kubernetes.api.model.ObjectMeta.getDeletionGracePeriodSeconds(), public void io.fabric8.kubernetes.api.model.ObjectMeta.setDeletionGracePeriodSeconds(java.lang.Long), public java.util.Map io.fabric8.kubernetes.api.model.ObjectMeta.getAdditionalProperties(), public void io.fabric8.kubernetes.api.model.ObjectMeta.setAdditionalProperties(java.util.Map), public java.lang.Object io.fabric8.kubernetes.api.model.ObjectMeta.edit()]

If you do the same with 3.7.4:

Fields size: 0
Fields: []
Methods size: 40
Methods: [public java.util.Map io.fabric8.kubernetes.api.model.ObjectMeta.getLabels(), public java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.getName(), public boolean io.fabric8.kubernetes.api.model.ObjectMeta.equals(java.lang.Object), public java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.toString(), public int io.fabric8.kubernetes.api.model.ObjectMeta.hashCode(), public java.util.Map io.fabric8.kubernetes.api.model.ObjectMeta.getAnnotations(), public void io.fabric8.kubernetes.api.model.ObjectMeta.setName(java.lang.String), public java.util.List io.fabric8.kubernetes.api.model.ObjectMeta.getManagedFields(), public void io.fabric8.kubernetes.api.model.ObjectMeta.setCreationTimestamp(java.lang.String), public void io.fabric8.kubernetes.api.model.ObjectMeta.setDeletionTimestamp(java.lang.String), public void io.fabric8.kubernetes.api.model.ObjectMeta.setGenerateName(java.lang.String), public void io.fabric8.kubernetes.api.model.ObjectMeta.setManagedFields(java.util.List), public java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.getCreationTimestamp(), public java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.getGenerateName(), public java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.getResourceVersion(), public void io.fabric8.kubernetes.api.model.ObjectMeta.setResourceVersion(java.lang.String), public void io.fabric8.kubernetes.api.model.ObjectMeta.setAdditionalProperty(java.lang.String,java.lang.Object), public java.util.List io.fabric8.kubernetes.api.model.ObjectMeta.getOwnerReferences(), public java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.getDeletionTimestamp(), public void io.fabric8.kubernetes.api.model.ObjectMeta.setOwnerReferences(java.util.List), public io.fabric8.kubernetes.api.model.ObjectMetaBuilder io.fabric8.kubernetes.api.model.ObjectMeta.edit(), public java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.getUid(), public void io.fabric8.kubernetes.api.model.ObjectMeta.setUid(java.lang.String), public java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.getSelfLink(), public void io.fabric8.kubernetes.api.model.ObjectMeta.setSelfLink(java.lang.String), public io.fabric8.kubernetes.api.model.ObjectMetaBuilder io.fabric8.kubernetes.api.model.ObjectMeta.toBuilder(), protected boolean io.fabric8.kubernetes.api.model.ObjectMeta.canEqual(java.lang.Object), public java.util.List io.fabric8.kubernetes.api.model.ObjectMeta.getFinalizers(), public java.lang.String io.fabric8.kubernetes.api.model.ObjectMeta.getNamespace(), public void io.fabric8.kubernetes.api.model.ObjectMeta.setNamespace(java.lang.String), public java.lang.Long io.fabric8.kubernetes.api.model.ObjectMeta.getGeneration(), public void io.fabric8.kubernetes.api.model.ObjectMeta.setAnnotations(java.util.Map), public void io.fabric8.kubernetes.api.model.ObjectMeta.setFinalizers(java.util.List), public void io.fabric8.kubernetes.api.model.ObjectMeta.setGeneration(java.lang.Long), public void io.fabric8.kubernetes.api.model.ObjectMeta.setLabels(java.util.Map), public java.lang.Long io.fabric8.kubernetes.api.model.ObjectMeta.getDeletionGracePeriodSeconds(), public void io.fabric8.kubernetes.api.model.ObjectMeta.setDeletionGracePeriodSeconds(java.lang.Long), public java.util.Map io.fabric8.kubernetes.api.model.ObjectMeta.getAdditionalProperties(), public void io.fabric8.kubernetes.api.model.ObjectMeta.setAdditionalProperties(java.util.Map), public java.lang.Object io.fabric8.kubernetes.api.model.ObjectMeta.edit()]

If you set the same annotations in methods as in fields you will get around this particular issue, but there could be other annotation functionality that only checks fields, in which case it would be broken.

The proper fix IMO would be to stop using reflection for discovering annotations. This would reduce binary executables, make native image builds faster and you would avoid issues like this because the jandex index has everything.

This concludes my investigation as far as this issue is concerned. More investigation could be carried out to find out why https://github.com/quarkusio/quarkus/pull/38886 causes field reflection to stop working. The reproducer I pointed above could be used to do such investigation, but that goes beyond the scope of this issue /cc @zakkak

manusa commented 2 months ago

Awesome work and investigation, Galder :raised_hands:

Just for the record, ValidationSchema and KubeSchema classes will be gone in Kubernetes Client 7.0.0

Before #38886, every Kubernetes Client model type was registered for reflection because both KubeSchema and ValidationSchema where inherently linking every single model class. I think a new issue should be opened to track (and fix) why the AnnotatedFieldCollector is not working as expected. Probably the traversal process is broken at some point. I would also advocate for using Jandex instead, since it'll bring consistency to both application packaging modes (JVM & Native).

jorsol commented 2 months ago

Ok, based on the analysis of declared fields, the issue could be lying here:

https://github.com/quarkusio/quarkus/blob/265b721efcf73a09284300bb3550ae930c462b36/extensions/kubernetes-client/deployment/src/main/java/io/quarkus/kubernetes/client/deployment/KubernetesClientProcessor.java#L183-L187

For some reason (maybe intentionally or not), the model is not registering the fields, so, maybe the logic in populateReflectionRegistrationLists is not correct.

I have opened a draft PR https://github.com/quarkusio/quarkus/pull/42028 with the fix so it can be evaluated that works.

BTW, with this change, there seems to be no performance penalty, nor size increase on the final native image.