knative-extensions / eventing-kafka

Kafka integrations with Knative Eventing.
Apache License 2.0
77 stars 83 forks source link

KafkaChannel Configuration option on a per object base #1005

Open matzew opened 2 years ago

matzew commented 2 years ago

Problem

We currently have a global configuration for one dedicated Apache Kafka cluster: https://github.com/knative-sandbox/eventing-kafka/blob/main/config/channel/consolidated/configmaps/kafka-config.yaml#L24

this forces all KafkaChannel objects to be connected against a single kafka cluster.

This does not fly well, when offering "Knative Kafka" as a service, where customers have desire for individual Kafka cluster usage, on a per channel bases.

The global default is a convenient setting for on prem, but not for more cloud-centric cases

Proposal

Adding the following fields to the KafkaChannel spec

Persona: Which persona is this feature for?

Exit Criteria A measurable (binary) test that would indicate that the problem has been resolved.

Time Estimate (optional): How many developer-days do you think this may take to resolve?

Additional context (optional) Add any other context about the feature request here.

aliok commented 2 years ago

+1

Why these are missing in the first place:

Although I understand these usecases, it would be really nice to have the ability to pass these in the spec.

Something logic like:

In managed service case, we wouldn't like the fallback actually. So, maybe also a way to enable/disable fallback too (probably in the configmap) ?

pierDipi commented 2 years ago

What about this similar to KafkaSink:

spec:
   bootstrapServers:
      - my-cluster-kafka-bootstrap.kafka:9092
   auth:
     secret:
       ref:
         name: my_secret

It's a bit more verbose, but it will give us more flexibility in the future regarding "auth(n|z)".

matzew commented 2 years ago

Being able to use the generic Channel

IMO a weak part of the API, only interesting on the surface

matzew commented 2 years ago

In managed service case, we wouldn't like the fallback actually. So, maybe also a way to enable/disable fallback too (probably in the configmap) ?

I'd expect someone would set it up WITHOUT any default, on cases like that - hence the fallback would fail, due to no bootstrap present. Log will tell the devs.

No need to configure this too

travis-minke-sap commented 2 years ago

You would still be limited to a single "configuration" of the Kafka "clients" (i.e. SASL, TLS, etc in Sarama.Config) - is that an acceptable assumption (i.e. all Kafka instances are similar and use the same set of Kafka Root Certs, etc..) or should there also be a field pointing to the ConfigMap (in which case the brokers and Secret name/namespace can just stay in the ConfigMap)? We are also watching the ConfigMap / Secret for changes and reacting to them in controller - would have to consider impacts there (watching multiple/all such instances, etc.)

I thought the intent was to use the "scope" (namespaced) annotation to allow distinct ConfigMap / Secret per-namespace ? (not that the we support that fully today or anything ; ). This allows for a middle-ground where each Kafka instance is specified once without requiring every Topic in the Kafka instance to replicate it. I have't thought through that solution completely so maybe there are blockers/difficulties? Has that annotation based approach been scrapped, or are the two supposed to co-exist?

The intent is valid/good (multi-tenant separate Kafka instance support) and in general this approach is ok as long as there is fallback to current usage for those who don't want to have to specify the brokers every time (because they will always only have a single instance).

pierDipi commented 2 years ago

You would still be limited to a single "configuration" of the Kafka "clients" (i.e. SASL, TLS, etc in Sarama.Config) - is that an acceptable assumption (i.e. all Kafka instances are similar and use the same set of Kafka Root Certs, etc..) or should there also be a field pointing to the ConfigMap (in which case the brokers and Secret name/namespace can just stay in the ConfigMap)? We are also watching the ConfigMap / Secret for changes and reacting to them in controller - would have to consider impacts there (watching multiple/all such instances, etc.)

See https://github.com/knative-sandbox/eventing-kafka/issues/1005#issuecomment-982469704, auth and bootstrap servers would be for each KafkaChannel instance.

I thought the intent was to use the "scope" (namespaced) annotation to allow distinct ConfigMap / Secret per-namespace ? (not that the we support that fully today or anything ; ). This allows for a middle-ground where each Kafka instance is specified once without requiring every Topic in the Kafka instance to replicate it. I have't thought through that solution completely so maybe there are blockers/difficulties? Has that annotation based approach been scrapped, or are the two supposed to co-exist?

This might not work in all cases.

The intent is valid/good (multi-tenant separate Kafka instance support) and in general this approach is ok as long as there is fallback to current usage for those who don't want to have to specify the brokers every time (because they will always only have a single instance).

Agree!

cardil commented 2 years ago

I don't like the idea to add auth to channel object. Instead, a better approach would be to create additional CR KafkaAuthentication (or even one more KafkaAuthenticationBinding).

As @aliok mentioned, adding auth breaks option to use default channel, and creep administrative information into developer scoped type.

Putting auth in separate type allows us to keep those important functions. It also, allows to have centralized authentication model, that can be fine tuned in various ways - matching by ref, by namespace, by prefix. It's also less verbose, as you don't need to type auth in every object (sometimes hundreds of times).

aliok commented 2 years ago

The intent is valid/good (multi-tenant separate Kafka instance support) and in general this approach is ok as long as there is fallback to current usage for those who don't want to have to specify the brokers every time (because they will always only have a single instance).

Glad to hear this @travis-minke-sap, thanks for your support!

What I would want to have in the spec are at least the bootstrapServers and authsecretName/namespace.

... or should there also be a field pointing to the ConfigMap (in which case the brokers and Secret name/namespace can just stay in the ConfigMap)?

I think I would still prefer having brokers and secret name+namespace in separate fields in the spec. ...and another field pointing to a configmap. In this case, configmap wouldn't have the broker and auth info in it. Or, it can, but there would be a an order of preference for broker and secret info as:

So, let me add some YAML:

apiVersion: messaging.knative.dev/v1beta1
kind: KafkaChannel
metadata:
  name: kafka-channel
spec:
  bootstrapServers:
    - my-cluster-kafka-bootstrap.kafka:9093
  auth:
    secret:
      ref:
        name: my_secret
        # not 100% sure about this
        namespace: foo
  config:
      ref:
        name: my-config
        namespace: bar

We can fallback to my-config configmap (referenced in config field) for missing spec.bootstrapServers and spec.auth fields. If those information also don't exist in my-config, we can fallback to the global config, config-kafka in knative-eventing.

I think this way, we can keep all channel implementations getting the fields what they need and things would still work even when users don't change anything.

About the default channel...

Changes above would still make the generic Channel resource working just like before, as things will fallback to global config.

Another thing I mentioned:

In managed service case, we wouldn't like the fallback actually. So, maybe also a way to enable/disable fallback too (probably in the configmap) ?

We can do this on the KafkaChannel implementations. Config doesn't have a strong API anyway, so the implementation doc can say "add field XYZ to prevent fallback".

matzew commented 2 years ago

+1 on the bootstrap server field

not sure here:

    secret:
      ref:
        name: my_secret
        # not 100% sure about this
        namespace: foo

I'd kick that.

Instead, let's start w/ Duck typing a spec.config, like we have on the BROKER (do we care about generic ChannelAPI?)

E.g:

...
spec:
  config:
    apiVersion: v1
    kind: ConfigMap
    name: matze-kafka-channel-config
    namespace: mwessend-dev
...

The referenced config map would follow the schema of the auth parts:

apiVersion: v1
kind: ConfigMap
metadata:
  name: matze-kafka-channel-config
  namespace: mwessend-dev
data:
  auth.secret.ref.name: kafka-sasl-secret

That follows the same syntax as in: https://knative.dev/docs/eventing/broker/kafka-broker/#security

Custom Kafka Config CRDs

In a later version, we can create those, and make the spec.config understand those too

aliok commented 2 years ago

Instead, let's start w/ Duck typing a spec.config, like we have on the BROKER (do we care about generic ChannelAPI?)

That's also an option. What I offered is consistent with KafkaSink, what you offered is consistent with KafkaBroker. I don't have a strong opinion but, to me, every KafkaChannel implementation will need a bootstrapServer info and also auth secret info. I prefer having more and more strongly typed API fields for these, but I am fine with both.

pierDipi commented 2 years ago

I'd recommend following KafkaSink instead of KafkaBroker because the ConfigMap is an external object and its lifecycle management is something that becomes easily complicated and I don't see a reason for it when you can have proper spec fields.

In addition, with a ConfigMap, we also lose the ability to do validation at the webhook level (which has a nicer UX).

matzew commented 2 years ago

The sink approach is also easier to migrate over to a actual type

On Thu 10. Feb 2022 at 18:27, Pierangelo Di Pilato @.***> wrote:

I'd recommend following KafkaSink instead of KafkaBroker because the ConfigMap is an external object and its lifecycle management is something that becomes easily complicated and I don't see a reason for it when you can have proper spec fields.

In addition, with a ConfigMap, we also lose the ability to do validation at the webhook level.

— Reply to this email directly, view it on GitHub https://github.com/knative-sandbox/eventing-kafka/issues/1005#issuecomment-1035203802, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABGPTSMRE2VNM6CRA4ELCDU2PYPHANCNFSM5JBDSFNA . You are receiving this because you authored the thread.Message ID: @.***>

-- Sent from Gmail Mobile

aliok commented 2 years ago

Ok, then you agree to this one?

apiVersion: messaging.knative.dev/v1beta1
kind: KafkaChannel
metadata:
  name: kafka-channel
spec:
  bootstrapServers:
    - my-cluster-kafka-bootstrap.kafka:9093
  auth:
    secret:
      ref:
        name: my_secret
        # not 100% sure about this
        namespace: foo
  config:
      ref:
        name: my-config
        namespace: bar
matzew commented 2 years ago

IMO this looks good, except for the config fallback. Dont like that

aliok commented 2 years ago

IMO this looks good, except for the config fallback. Dont like that

I think we need the fallback because we need to be backwards compatible and we need to support the scenarios that the broker and auth info is not in the spec.

aliok commented 2 years ago

BTW, one implication of having these in spec is that reconciliation is more expensive.

We can still cache the global config: config-kafka configmap.

cc @travis-minke-sap @pierDipi

pierDipi commented 2 years ago

I think the config ref could be part of a separate discussion.

I'm imagining this high-level logic:

We won't be able to cache the Sarama client and the admin client anymore.

This is a good point but I don't think we do this even today because sarama is buggy on long-running connections, do we?

We need to read the configmap referenced in the spec in every reconciliation cycle.

That's the reason, I'd separate the 2 discussions, bootstrapServers/auth from the config.

travis-minke-sap commented 2 years ago

Hey - sorry for not responding earlier - been wrapping up some internal work...

The Distributed KafkaChannel (receiver and dispatcher deployments) is currently watching a Secret (kafka-cluster), specified via environment variables, in order to dynamically re-create the Sarama clients when the SASL config changes. This is done to support automatic rotation of client passwords / tokens.

Both KafkaChannel implementations dynamically create new Sarama admin clients for every Topic creation / deletion.

If the Secret is specified in the KafkaChannel spec, then this would require control-plane -> data-plane communication from the Controller to the Receiver / Dispatchers to let them know to start using a different Secret. This could be a control-protocol type communication (in-place), or the Controller updating environment variables on the respective Deployments in order to force a rollout restart.

The main problem for the Distributed KafkaChannel though, is that there is only a single "Receiver" Deployment for the entire cluster. This instance services ALL KafkaChannels and couldn't be tied to a specific auth/config in a single KafkaChannel. Not sure how we'd address this without tackling multi-tenant namespacing.

I don't want the run-time architectural needs of the Distributed KafkaChannel to prevent you from moving forward with this more granular configuration, but we will need to present a coherent documented description of what level of configuration is supported by which components. Or have a plan to ensure all the components can support the new structure?

Maybe it's worth considering a split in the config for the new Java implementations and the existing components. I know it is too early to consider deprecating things, but sometimes dragging old baggage forward isn't worth it? Just something to stew on ; )

aliok commented 2 years ago

That's the reason, I'd separate the 2 discussions, bootstrapServers/auth from the config.

Ok, let's leave the config reference discussion here.

aliok commented 2 years ago

Thanks for detailed comments @travis-minke-sap

I don't want the run-time architectural needs of the Distributed KafkaChannel to prevent you from moving forward with this more granular configuration, but we will need to present a coherent documented description of what level of configuration is supported by which components. Or have a plan to ensure all the components can support the new structure?

We can add a validation in distributed channel webhook to reject channel resources that use the proposed fields and we can create a good documentation about the implementation differences around the fields.

Would this be an option?

aliok commented 2 years ago

Quick note: There's another usecase: cluster admins don't want the developers to specify bootstrapServer and secret in the spec as they want to manage it themselves. So, we need something additional to enforce this.

thx @lionelvillard

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

pierDipi commented 2 years ago

/remove-lifecycle stale /triage accepted