knative / eventing-contrib

Event Sources
Apache License 2.0
224 stars 225 forks source link

Make consistent configuration of Kafka event source and channel #655

Closed aslom closed 4 years ago

aslom commented 5 years ago

Expected Behavior

When I have configured Kafka brokers I expect I can reuse the configuration for Kafka sources and channels.

Ideal behavior would be to allow reuse of existing Kafka operators that make Kafka available inside kubernetes cluster. That means developer or system integrator need to do error prone copying and pasting and sometimes translation between different configuration formats.

Persona: Which persona is this feature for? Event consumer (developer), Event producer, System Integrator

Actual Behavior

Configuration is very different and it is not consistent. Moreover some aspects can be configured only for source and not for channel (and maybe vice versa)

Exit Criteria The same configuration can be re-used for Kafka event source and channel.

Investigate and document inconsistencies with Kafka operators.

Steps to Reproduce the Problem

  1. Configure Kafka broker or use of Kafka operators (for example https://operatorhub.io/?keyword=kafka or https://github.com/IBM/event-streams-topic#example )
  2. Configure Kafka source - follow https://knative.dev/development/eventing/samples/kafka/source/
  3. Configure Kafka channel - follow https://knative.dev/development/eventing/samples/kafka/channel/
  4. Compare configuration artifacts created in previous steps

Additional Info

For example inconsistencies reported see https://github.com/knative/eventing-contrib/issues/323 and https://github.com/knative/eventing-contrib/issues/643

aslom commented 5 years ago

Comments - what are your thought about importance of making Kafka configuration consistent in Knative @lionelvillard @matzew @prestona @lberk

aslom commented 5 years ago

Related issue may be to make additional configuration in future consistent by reusing Kafka-specific code across source and channel?

aslom commented 5 years ago

And maybe have one controller for both Kafka channel and source? https://github.com/knative/eventing-contrib/issues/533

matzew commented 5 years ago

the consumer part is already reused, and yes, happy to make the channel cfg like the source ones

aslom commented 5 years ago

Some configuration options may be not only across Kafka source and channel but also reused for any channel and source like resource limits? https://github.com/knative/eventing-contrib/issues/364

aslom commented 5 years ago

@matzew I will put together one doc to show proposed configuration, identify what is missing? Use google doc?

matzew commented 5 years ago

yeah, sounds good

aslom commented 5 years ago

@matzew @lionelvillard I have written down current state of how Knative/Kubernetes Kafka configuration is done and started on proposal for improvements - please edit it and comment: https://docs.google.com/document/d/1PLYm8amLRKmKK5FAo_34ZOKwTbYU9ooMjrjwkiPc7qA/edit?usp=sharing

matzew commented 5 years ago

added comments

aslom commented 5 years ago

Thank you for your comments @matzew - I added replies and first stab on what could be shared config map / secret

aslom commented 5 years ago

@matzew @lberk @lionelvillard I investigated options for Kafka configuration and what is used today within Apache Kafka and docker images. So if there are already Kafka configurations we should make it easy to use them in Knative without any changes. What I propose is to use Apache Kafka property names directly and allow to override them with well defined list of environment variables - started writing it down in https://docs.google.com/document/d/1PLYm8amLRKmKK5FAo_34ZOKwTbYU9ooMjrjwkiPc7qA/edit# Please let me know what you think and add your comments.

prestona commented 5 years ago

Reading through @aslom's proposal, I thought I'd have a go at drawing up a straw man of what the configuration could look like. Here's where I got to:

apiVersion: sources.eventing.knative.dev/v1alpha1
kind: KafkaSource
metadata:
  name: my-kafka-source
spec:
  specFrom:
    configMapRef: kafka-cluster-config
  group.id: consumer-group-id
  topics: topic1,topic2
---
apiVersion: messaging.knative.dev/v1alpha1
kind: KafkaChannel
metadata:
  name: my-kafka-channel
spec:
  specFrom:
    configMapRef: kafka-cluster-config
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: kafka-cluster-config
data:
  bootstrap.servers: broker-0:9093,broker-1:9093,broker-2:9093
  security.protocol: sasl_ssl
  sasl.mechanisms: PLAIN
  sasl.username:
    secretKeyRef:
      name: kafka-creds-secret
      key: user
  sasl.password:
    secretKeyRef:
      name: kafka-creds-secret
      key: password

In addition to renaming the configuration properties to match the equivalent Kafka broker / consumer / producer properties, I have tried to find a way that the source and channel can share a set of common configuration. Kubernetes pods can share a common set of environment variables using a envFrom and configMapRef - for example:

containers:
  - envFrom:
      - configMapRef:
          name: common-env

I've tried to adapt this with specFrom and configMapRef which would have the semantic of merging the contents of a config map with the other properties specified under spec. AFAIK pods don't support an analogous merging of referenced config and inline config, so this would be a difference in behaviour. There's also the potential for the same property name (but with different values) to appear in both places, for which there would need to be a precedence rule.

You might have also noticed that I've deviated from the Kafka Java client property names for SASL. That's because the Java version looks something like this:

sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="user123" password="passw0rd";

Which is heavily influenced by the format of Java's JVM Config Login File. Instead, I was inspired by the librdkafka configuration which seems a bit more platform agnostic.

Finally, in addition to using secretKeyRef for referencing the SASL username and passwords, I think bootstrap.servers (and possibly other properties) should also support indirection via secretKeyRef and configMapKeyRef. This would allow interoperation with the various Kubernetes operators that provision Kafka and use their own formats for storing data about the provisioned Kafka cluster in secrets or config maps.

prestona commented 5 years ago

Hmmn. Configmaps are key/value pairs, and therefore cannot contain the structure of a secretRef. That's a shame. Using a custom resource for the shared config is a less attractive option.

aslom commented 5 years ago

@prestona it would be nice but it seems no support for it forthcoming :( https://github.com/kubernetes/kubernetes/issues/79224

lionelvillard commented 5 years ago

@aslom any progress?

prestona commented 5 years ago

@lionelvillard - well, I think we've ruled out one potential approach :smile:

Mulling this over, I can think of two alternate approaches that we could take:

Option 1

Implement something similar to my comment, here, but using a CR rather than a config map for the common configuration. I don't really like this approach, as I can't point at another example where this has been done. There's also the question of what to do if the same configuration property is present in, say, both the source resource and this custom configuration resource - for example:

apiVersion: sources.eventing.knative.dev/v1alpha1
kind: KafkaSource
metadata:
  name: my-kafka-source
spec:
  specFrom:
    kafkaCommonRef: kafka-cluster-config
  security.protocol: ssl                # duplicated in KafkaCommon CR
  group.id: consumer-group-id
  topics: topic1,topic2
---
apiVersion: sources.eventing.knative.dev/v1alpha1
kind: KafkaCommon
metadata:
  name: kafka-cluster-config
data:
  bootstrap.servers: broker-0:9093,broker-1:9093,broker-2:9093
  security.protocol: sasl_ssl          # duplicated in KafkaSource
  sasl.mechanisms: PLAIN
  # etc. etc.

In this case, which value of security.protocol do we use? Again, I haven't come across resources which have an order of precedence. So I wonder if this will surprise users, and be difficult to debug if the duplication is introduced by accident.

Option 2

Move away from trying to store common configuration in one resource and allow individual properties of the Kafka source / channel to reference configuration properties from a config map or secret. As a starting point, we could require sasl.username and sasl.password to be a secretKeyRef and allow bootstrap.servers to either be a string literal, a configMapRef or a secretKeyRef. In the future, we might want to extend this to having more configuration properties supporting indirection.

For example:

apiVersion: sources.eventing.knative.dev/v1alpha1
kind: KafkaSource
metadata:
  name: my-kafka-source
spec:
  bootstrap.servers: 
    configMapKeyRef:          # point to common configuration
      name:  kafka-cluster-config
      key: bootstrap.servers
  security.protocol: sasl_ssl
  sasl.mechanisms: PLAIN
  sasl.username:
    secretKeyRef:
      name: kafka-creds-secret
      key: user
  sasl.password:
    secretKeyRef:
      name: kafka-creds-secret
      key: password
  group.id: consumer-group-id
  topics: topic1,topic2
---
apiVersion: messaging.knative.dev/v1alpha1
kind: KafkaChannel
metadata:
  name: my-kafka-channel
spec:
  bootstrap.servers: 
    configMapKeyRef:          # point to common configuration
      name:  kafka-cluster-config
      key: bootstrap.servers
  security.protocol: sasl_ssl
  sasl.mechanisms: PLAIN
  sasl.username:
    secretKeyRef:
      name: kafka-creds-secret
      key: user
  sasl.password:
    secretKeyRef:
      name: kafka-creds-secret
      key: password
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: kafka-cluster-config
data:
  bootstrap.servers: broker-0:9093,broker-1:9093,broker-2:9093

Obviously this is a bit more verbose, and has property names (but not values) duplicated across both the configuration of the source and channel. However, the benefit is that it has enough flexibility to allow Kafka-specific configuration to be stored in one place (perhaps a secret generated by an operator that provisions Kafka) and then referenced from multiple sources and/or channels. Of the two options, this would be my preference.

alanconway commented 5 years ago

On Thu, Oct 31, 2019 at 10:23 AM Adrian Preston notifications@github.com wrote:

@lionelvillard https://github.com/lionelvillard - well, I think we've ruled out one potential approach 😄

Mulling this over, I can think of two alternate approaches that we could take: Option 1

Implement something similar to my comment, here https://github.com/knative/eventing-contrib/issues/655#issuecomment-546121495, but using a CR rather than a config map for the common configuration. I don't really like this approach, as I can't point at another example where this has been done. There's also the question of what to do if the same configuration property is present in, say, both the source resource and this custom configuration resource - for example:

apiVersion: sources.eventing.knative.dev/v1alpha1 kind: KafkaSource metadata:

name: my-kafka-source spec:

specFrom:

kafkaCommonRef: kafka-cluster-config

security.protocol: ssl # duplicated in KafkaCommon CR

I do like this approach provided you specify the simple precedence rule that "local" values given directly in the YAML are preferred over "shared" values from a config map.

This kind of "layered" config is very common, for example command-line tools that read a config file, env. vars. and command-line flags: command line flags (most local) get highest precedence and the config file (most shared) gets lowest. I don't think it would be surprising to most developers.

The benefit is it supports a range of configuration styles:

In this case, which value of security.protocol do we use? Again, I haven't come across resources which have an order of precedence. So I wonder if this will surprise users, and be difficult to debug if the duplication is introduced by accident.

The notion of "nearest" or "more specific" config taking precedence is common enough in the software world at large that I don't think it would cause confusion.

Option 2

Move away from trying to store common configuration in one resource and allow individual properties of the Kafka source / channel to reference configuration properties from a config map or secret. As a starting point, we could require sasl.username and sasl.password to be a secretKeyRef and allow bootstrap.servers to either be a string literal, a configMapRef or a secretKeyRef. In the future, we might want to extend this to having more configuration properties supporting indirection.

Seems to me you can combine the two options: specFrom loads everything in a config-map as the default config. Subsequent config items can specify values directly in YAML or references to other config map or secrets entries as you show below. A singfle, flexible, standardized approach to config to be used consistently by all sources (eventually)

For example:

apiVersion: sources.eventing.knative.dev/v1alpha1 kind: KafkaSource metadata:

name: my-kafka-source spec:

bootstrap.servers:

configMapKeyRef:          # point to common configuration

  name:  kafka-cluster-config

  key: bootstrap.servers

security.protocol: sasl_ssl

sasl.mechanisms: PLAIN

sasl.username:

secretKeyRef:

  name: kafka-creds-secret

  key: user

sasl.password:

secretKeyRef:

  name: kafka-creds-secret

  key: password

group.id: consumer-group-id

topics: topic1,topic2


apiVersion: messaging.knative.dev/v1alpha1 kind: KafkaChannel metadata:

name: my-kafka-channel spec:

bootstrap.servers:

configMapKeyRef:          # point to common configuration

  name:  kafka-cluster-config

  key: bootstrap.servers

security.protocol: sasl_ssl

sasl.mechanisms: PLAIN

sasl.username:

secretKeyRef:

  name: kafka-creds-secret

  key: user

sasl.password:

secretKeyRef:

  name: kafka-creds-secret

  key: password

apiVersion: v1 kind: ConfigMap metadata:

name: kafka-cluster-config data:

bootstrap.servers: broker-0:9093,broker-1:9093,broker-2:9093

Obviously this is a bit more verbose, and has property names (but not values) duplicated across both the configuration of the source and channel. However, the benefit is that it has enough flexibility to allow Kafka-specific configuration to be stored in one place (perhaps a secret generated by an operator that provisions Kafka) and then referenced from multiple sources and/or channels. Of the two options, this would be my preference.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/knative/eventing-contrib/issues/655?email_source=notifications&email_token=AB3LUXXZQ35JVLFBYRDH2Z3QRKW3NA5CNFSM4JBMDZH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECXHE2I#issuecomment-548303465, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3LUXUUM6OZYPZOVV4SPX3QRKW3NANCNFSM4JBMDZHQ .

lionelvillard commented 5 years ago

One scenario to consider is making it easy for k8s services using Kafka clients to use this configuration. That's usually done via mounting configmap and secret. So for this scenario I would probably stay away from CRD for the common shareable configuration.

I'm fine with overloading at the CRD level (as @alanconway suggest). This is common.

aslom commented 5 years ago

It looks to me as well that overriding shared configuration would layered approach would be the best developer/operator experience @alanconway @lionelvillard @prestona

Now that 0.10 is released I am going to open work-in-progress PR so we can test what works

aslom commented 5 years ago

I have put together slides that describe the problem and proposed solution with overriding https://docs.google.com/presentation/d/1FaOYpenTlvZ5jq8ZYW56cX5TAYI9g42vElWfzMI2pTE/edit?usp=sharing

aslom commented 5 years ago

/assign

aslom commented 5 years ago

As I am implementing initial design I realized lot of things can be simplified by using kafka.properties file as a secret that is then passed to Kafka source and channel. That file provides properties that can be easily copied from existing Apache Kafka (or librdkafka) setup.

For example using kafka.properties from Apache Kafka I can create one shared secret and use it to configure Kafka source:

$ cat kafka.properties
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="T7f3xuwnKG0Q5V71" password="Xwup9VRp9diu4CD04vHLBIe9cLlAy3QP";
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
ssl.protocol=TLSv1.2
ssl.enabled.protocols=TLSv1.2
ssl.endpoint.identification.algorithm=HTTPS
bootstrap.servers=my-server:9093
$ kubectl create secret generic kafka-binding --from-file=kafka.properties
  apiVersion: sources.eventing.knative.dev/v1alpha1
   kind: KafkaSource
   metadata:
     name: kafka-source
   spec:
     binding: 
       name: kafka-binding
     bootstrapServers: REPLACE_WITH_CLUSTER_URL
...

Where bootstrapServers in YAML are overriding bootstrap.servers property from kafka.properties file (list of properties to consider https://kafka.apache.org/documentation/#consumerconfigs )

Also updated https://docs.google.com/document/d/1PLYm8amLRKmKK5FAo_34ZOKwTbYU9ooMjrjwkiPc7qA/edit#heading=h.8j7ss5mualq6

prestona commented 5 years ago

@aslom, Using the contents of kafka.properties as the key/values in a secret would certainly simplify the use-case where Kafka configuration already exists elsewhere.

I'm keen that there is also a way to reference SASL username, password and bootstrap servers when they are stored elsewhere - as this supports use-cases where the Kafka configuration is stored in cluster using a secret which perhaps doesn't conform exactly to the Kafka configuration properties. This could be achieved by supporting valueFrom / secretKeyRef properties for the overrides in the KafkaSource / KafkaChannel CRD.

Finally, I think that using Java's sasl.jaas.config property is a bit ugly:

sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="T7f3xuwnKG0Q5V71" password="Xwup9VRp9diu4CD04vHLBIe9cLlAy3QP";

I would prefer lifting librdkafka's configuration properties for SASL. Specifically: sasl.username and sasl.password. Using separate properties for username / password also makes it possible to reference values from another secret via accepting valueFrom / secretKeyRef properties.

aslom commented 5 years ago

@prestona I have now an example how that would work in https://github.com/knative/eventing-contrib/compare/master...aslom:consistent-knative-kafka-config

I am finishing writing and will be doing PR as soon as basic properties work.

I will see what can be done to support valueFrom / secretKeyRef for source and channel CR.

github-actions[bot] commented 4 years ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.