strimzi / strimzi-kafka-operator

Apache Kafka® running on Kubernetes
https://strimzi.io/
Apache License 2.0
4.8k stars 1.28k forks source link

Add apiVersion, kind, and metadata to CRD schema #5997

Closed maxbrunet closed 9 months ago

maxbrunet commented 2 years ago

Is your feature request related to a problem? Please describe.

I am trying to make my CI pipeline stricter to catch unknown fields at the root of the Kubernetes manifests, but it fails on Strimzi CRs, because their schema misses the fields apiVersion, kind, and metadata.

Describe the solution you'd like

Could we add these fields to the schema of all CRDs, please?

This is the typical properties the CRD schema should have at spec.versions.schema.openAPIV3Schema.properties:

apiVersion:
  description: 'APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
  type: string
kind:
  description: 'Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
  type: string
metadata:
  type: object

Describe alternatives you've considered

Patch the schema myself with Kustomize for example

Additional context

I use instrumenta/kubeval or yannh/kubeconform to validate the manifests. And this script to create a JSON schema out of the CRDs and inject additionalProperties: false to all objects with properties: https://github.com/yannh/kubeconform/blob/master/scripts/openapi2jsonschema.py

$ openapi2jsonschema.py https://github.com/strimzi/strimzi-kafka-operator/releases/download/0.26.0/strimzi-crds-0.26.0.yaml
JSON schema written to kafka_v1beta2.json
JSON schema written to kafkaconnect_v1beta2.json
JSON schema written to kafkatopic_v1beta2.json
JSON schema written to kafkatopic_v1beta1.json
JSON schema written to kafkatopic_v1alpha1.json
JSON schema written to kafkauser_v1beta2.json
JSON schema written to kafkauser_v1beta1.json
JSON schema written to kafkauser_v1alpha1.json
JSON schema written to kafkamirrormaker_v1beta2.json
JSON schema written to kafkabridge_v1beta2.json
JSON schema written to kafkaconnector_v1beta2.json
JSON schema written to kafkamirrormaker2_v1beta2.json
JSON schema written to kafkarebalance_v1beta2.json
scholzj commented 2 years ago

Do you have any documentation suggesting they should be there? Because they do not seem to be covered by the example in the Kube docs for example.

maxbrunet commented 2 years ago

Thank you for the fast reply, quickly looking around, I have not found a documentation being explicit about it, but it seems to be the convention. Among 118 CR schemas I have, 16 are missing the fields, of those, 3 are empty/removed apiVersions, and 13 are Strimzi

scholzj commented 2 years ago

TBH, I'm not sure what exactly are you trying to do with these tools. The CRs have to contain the things such as API version, Kind or metadata regardless the schema - Kube does not let you create it without them. So what are you actually validating and how? You talk about CRs - but it is you who creates the CRs. So what makes them invalid? I'm not sure I follow where the problem is since in Kube all seems to work fine.

Also, on the practical level:

Another thing to keep in mind is that a change which fixes this for you might break things for someone else. So I think it would be good to have some official docs to explain that these items should be part of the validation schema which does not seem to be the case. Otherwise it is not completely clear that this does not break other valid situations.

maxbrunet commented 2 years ago

So what are you actually validating and how? You talk about CRs - but it is you who creates the CRs. So what makes them invalid?

Yes, and as any human, I make mistakes, especially typos 😅. The CI validates the manifests like if it was running tests against a code base, if manifests are proven invalid, the PR cannot be merged. This makes reviews easier for humans as we do not have to worry about the fields being correct, well indented... if the CI passes, those points are good. If an invalid manifest passes, the CD pipeline will choke on it. It can be about detecting a simple typo like appVersion, but also detecting fields that should just not be there. The other day, I misplaced a bracket in Jsonnet which nested extra fields containing 2 resources into the root of another, the CD deployed the valid resources and left the problematic resource un-applied, this broke the monitoring of the (non-production) target cluster

kubernetes-sigs/kubebuilder, which I believe can be considered as a reference, consistently adds these fields to generated CRDs. Same is true for operator-framework/operator-sdk (sorry, I am not familiar enough with their code base to point you where it happens)

https://github.com/kubernetes-sigs/kubebuilder/blob/v3.2.0/testdata/project-v3/config/crd/bases/crew.testproject.org_admirales.yaml#L24-L35

https://github.com/operator-framework/operator-sdk/blob/v1.15.0/testdata/ansible/memcached-operator/config/crd/bases/cache.example.com_memcacheds.yaml#L20-L31

The schema sample I shared in the issue description is the same for the all the CRDs I have, only the description fields are slightly inconsistent (either this one, containing line breaks, or missing), only Strimzi does not have it

It seems to me that adding these properties is harmless. The schema { "metadata": { "type: "object" } } allows any properties, no need to worry about changes within.

scholzj commented 2 years ago

Yes, and as any human, I make mistakes, especially typos 😅. The CI validates the manifests like if it was running tests against a code base, if manifests are proven invalid, the PR cannot be merged. This makes reviews easier for humans as we do not have to worry about the fields being correct, well indented... if the CI passes, those points are good. If an invalid manifest passes, the CD pipeline will choke on it. It can be about detecting a simple typo like appVersion, but also detecting fields that should just not be there. The other day, I misplaced a bracket in Jsonnet which nested extra fields containing 2 resources into the root of another, the CD deployed the valid resources and left the problematic resource un-applied, this broke the monitoring of the (non-production) target cluster

I understand why you want to validate things. But I'm not sure I understand the issue really. The schema in the CRDs is valid and validates the resources successfully.

It seems to me that adding these properties is harmless. The schema { "metadata": { "type: "object" } } allows any properties, no need to worry about changes within.

Well, but that is also useless as it doesn't do any validation.

scholzj commented 2 years ago

Anyway, lets keep it open and see how it goes.

maxbrunet commented 2 years ago
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    # ...
foobar: # <- I want these tools to be able to fail if there is an extra key
        # this key could be meant to be somewhere else

For tools like kubeval to know if there are extra keys, the schema needs to be complete, including keys that are implicitly validated by the kube-apiserver (otherwise apiVersion and others are mistaken for extra keys).

Thank you for keeping it open, I know it is not absolutely required for the operator to work, but it can be considered as a nice-to-have in my opinion 🙂

scholzj commented 2 years ago

Triaged on 10.5.2022: There are backwards-compatibility concerns whether adding this to the schema for existing resources will break some other tools. We should consider adding this in the next API version (probably v1?).