strimzi / strimzi-kafka-operator

Apache Kafka® running on Kubernetes
https://strimzi.io/
Apache License 2.0
4.77k stars 1.28k forks source link

Add Kafka bridge deployment handled via the CO #1625

Closed ppatierno closed 5 years ago

ppatierno commented 5 years ago

In order to deploy the Kafka bridge alongside a Kafka cluster deployed by the Strimzi Cluster Operator, following a potential CR for describing the bridge itself.

apiVersion: kafka.strimzi.io/v1beta1
kind: KafkaBridge
metadata:
  name: my-kafka-bridge
spec:
  replicas: 1
  bootstrapServers: my-cluster-kafka-bootstrap:9092
  consumer:
    tls:
      trustedCertificates:
        - secretName: my-secret
          certificate: ca.crt
    authentication:
      type: tls
      certificateAndKey:
        secretName: my-secret
        certificate: public.crt
        key: private.key
  producer:
    tls:
      trustedCertificates:
        - secretName: my-secret
          certificate: ca.crt
    authentication:
      type: scram-sha-512 # plain supported as well
      username: my-user
      passwordSecret:
        secretName: my-user
        password: my-password-key
  http:
    port: 8080

Currently, the bridge doesn't have support for accepting TLS encrypted connection (so HTTPS) but it works with HTTP only. This could be addressed using a 3scale gateway in front of it or adding the feature in the future in the bridge itself. The same for users management.

At the same time, there is no support for accessing the Kafka cluster via TLS but the feature has to be implemented, so it's better having this configuration parameter in the CR already. The same is for authentication against the Kafka cluster. Each of them can be set for the producer and consumer (as it happens with the Kafka Mirror Maker). Of course, we are talking about authenticating each producer and consumer which are created on the bridge against the Kafka cluster (for each connected HTTP client). Such authentication could be different from the one against the connected HTTP clients from outside which is not implemented in the bridge right now (by 3scale gateway?).

The current Kafka bridge implementation supports the AMQP protocol as well but I would not include it for now. In the future, it should just be a matter of adding a spec.amqp property with all the related fields for the configuration.

[UPDATE]

Moving TLS and authentication on the root node without any differences between consumer and producer.

apiVersion: kafka.strimzi.io/v1beta1
kind: KafkaBridge
metadata:
  name: my-kafka-bridge
spec:
  replicas: 1
  bootstrapServers: my-cluster-kafka-bootstrap:9092
  tls:
    trustedCertificates:
      - secretName: my-secret
        certificate: ca.crt
  authentication:
    type: tls
    certificateAndKey:
      secretName: my-secret
      certificate: public.crt
      key: private.key
  http:
    port: 8080

The authentication section could support plain and scram-sha-512 as well in the following way.

...
authentication:
      type: scram-sha-512 # plain supported as well
      username: my-user
      passwordSecret:
        secretName: my-user
        password: my-password-key
...
ppatierno commented 5 years ago

@scholzj @tombentley @stanlyDoge @kyguy any feedback is much appreciated.

scholzj commented 5 years ago

Some comments:

ppatierno commented 5 years ago

Since this is not a Kafka component ... should it be just Bridge?

but it's a bridge for Kafka. When you are on Kube/OpenShift asking for the custom resources, it's more understandable doing oc get kafkabridge than oc get bridge that could be related to something different (maybe another custom resource not in Strimzi).

The bridge was born for AMQP. Then I abstracted a layer for using it with HTTP as well. Of course, if we think that the AMQP support will not be needed anymore (think about the AMQP Kafka Connect connector instead), the underlying common layer could be simplified for supporting HTTP only.

Do we really want the port as configurable?

Why not?

I think we need to add also healthchecks, templates, resources etc. (not needed to add them to the example, but we should keep it in mind)

Yes agree. Let's keep them in mind.

I do not understand the idea behind the separate consumer and producer configuration. I think it should be just a single connection here.

Of course each HTTP client has its own producer and consumer. Instead of saying "single connection" I would say that "single common configuration" could make sense for producing and consuming at least from TLS and auth point of view. We could then move tls and authentication section just under spec removing the difference between consumer and producer.

scholzj commented 5 years ago

but it's a bridge for Kafka. When you are on Kube/OpenShift asking for the custom resources, it's more understandable doing oc get kafkabridge than oc get bridge that could be related to something different (maybe another custom resource not in Strimzi).

Well, it could be also StrimziBridge :-). Anyway, if I'm the only one who sees the association to Kafka there as problematic then we should keep your suggestion.

Do we really want the port as configurable?

Why not?

It is something we will need to maintain and support. So why yes?

Of course each HTTP client has its own producer and consumer. Instead of saying "single connection" I would say that "single common configuration" could make sense for producing and consuming at least from TLS and auth point of view. We could then move tls and authentication section just under spec removing the difference between consumer and producer.

Right, I do not really care about the actual TCP connections (although it is not clear to me why not pool them and share one connection for multiple consumers / producers - but I let you deal witht he implementation there ;-)). I meant the configuration - I do not see any reason for separate consumer and producer configuration. As you suggest we should just move it level higher and remove the consumer and producer stuff.

tombentley commented 5 years ago

Just a wild crazy Wednesday morning idea... for the boostrapServers, I wonder whether as well as supporting a literal address we should also support specifying by reference a bootstrap Service and/or a Kafka.spec.kafka.listeners.* listener.

I agree about pulling tls and authentication up into the spec. Also about resources, templates etc. Rather than just keeping them in mind, @ppatierno can you post a revised example, so we can all be on the same page about what exactly is being proposed. We also need to consider logging and jvm options; I don't know whether you had those in mind already.

Can you elaborate a little about user management? I think you are saying the bridge will act as a single Kafka user, and 3scale would authenticate (and authorize?) bridge users separately, but what if we wanted some kind of authentication passthrough mode? (I'm not say we should support this, but I am interested in how the CR for that might look).

ppatierno commented 5 years ago

Just a wild crazy Wednesday morning idea... for the boostrapServers, I wonder whether as well as supporting a literal address we should also support specifying by reference a bootstrap Service and/or a Kafka.spec.kafka.listeners.* listener.

Isn't it too much tied to the bridge running on Kubernetes/OpenShift while it could be used on bare metal as well?

Can you elaborate a little about user management? I think you are saying the bridge will act as a single Kafka user, and 3scale would authenticate (and authorize?) bridge users separately, but what if we wanted some kind of authentication passthrough mode? (I'm not say we should support this, but I am interested in how the CR for that might look).

As far as I understood that was the initial idea by @scholzj when he proposed this "integration" with 3scale so offloading the users management to 3scale while the bridge acts as just one user against the Kafka cluster (as "others" do ... ;)).

tombentley commented 5 years ago

Isn't it too much tied to the bridge running on Kubernetes/OpenShift while it could be used on bare metal as well?

Well you're describing a CR, so the whole discussion assumes Kubernetes/OpenShift. The operator would need to resolve the reference and pass an plain address to the bridge.

ppatierno commented 5 years ago

Well you're describing a CR, so the whole discussion assumes Kubernetes/OpenShift.

Right :-) My mind is already at Disneyland.

scholzj commented 5 years ago

Just a wild crazy Wednesday morning idea... for the boostrapServers, I wonder whether as well as supporting a literal address we should also support specifying by reference a bootstrap Service and/or a Kafka.spec.kafka.listeners.* listener.

I would be quite interested with what you have in mind here. Could you post some ideas?

tombentley commented 5 years ago

Very roughly the idea was that the CR could support these alternative ways of specifying the boostrap servers:

  1. A literal address (basically the same as Paolo's suggestion, but different syntax)

    bootstrapServers:
    value: my-cluster-kafka-bootstrap:9092

    Presumably this sets the value of some env var in the Deployment container.

  2. A reference to a Service

    bootstrapServers:
    valueFrom: 
    serviceRef:
      name: my-cluster-kafka-bootstrap

    This is analogous to/inspired by the API for supplying an env var value from a secret. The operator would resolve the address from the Service, setting the value of the same env var in the Deployment container.

  3. A reference to a listener in a Kafka cluster:

    bootstrapServers:
    valueFrom: 
    kafkaClusterRef:
      name: my-cluster
      listener: plain

    The operator would resolve the address from the Kafka.status.listeners, setting the value of the same env var in the Deployment container.

That's as much as I've thought about it. If the idea holds water (any maybe it doesn't), we could use a similar pattern for KafkaMirrorMaker.

scholzj commented 5 years ago

A reference to a Service

The serviceRef would need a port and possibly an optional namespace field as well. And TBH I'm not sure it is significant improvement over typing <servicename>[.<namespace>]:<port>. Is it worth the maintenance effort?

A reference to a listener in a Kafka cluster

I kind of like this idea, but I do not think there is time to implement it. And again, not sure how much value it really has compared to the maintenance effort. Also, the status field is not supported yet, so it is hard to build on it (but maybe we do not need it, since it can be very easily calculated). So I think this might be a future option.

A literal address (basically the same as Paolo's suggestion, but different syntax)

I assume this would be a prerequisite for implementing the other two options. But since we do not have the other two options implemented yet and we might not be able to implement it (dependencies on status and time constraints), I find it a bit weird to use tis here but not anywhere else. This would create a ton of questions on support channels.

tombentley commented 5 years ago

If we're prepared to write a custom deserializer I think it would be possible to support

bootstrapServers: my-cluster-kafka-bootstrap:9092

initially and later support the others (either just the by-reference alternatives or the value: one as well). That's possible because the deserializer gets to peek at the tokens in the JSON stream and if it sees a string it knows it's the above case, but if it sees an object it parses it as an object. The fabric8 IntOrString does something similar.

And again, not sure how much value it really has compared to the maintenance effort.

I don't really see that it comes with a high maintenance effort. I mean, I don't think it's especially fragile as a design. Obviously there's an initial implementation burden, but really that would be something in operator-common which could then be reused for other components which needed to connect to a cluster. I think some of its value is that it's a bit more self-documenting that just an address, but I agree this doesn't really enable you to do something you couldn't do with just a literal address.

scholzj commented 5 years ago

If we're prepared to write a custom deserializer ...

Ok, so we go on with the Paolo's suggested form and try to look at this in next release for all places where we specify the address?

I don't really see that it comes with a high maintenance effort.

I'm not sure ... writing and maintaining unit tests and system tests, etc. It might be opinion based how much value it adds ... but it for sure adds non-zero effort for making sure it still works after another 5 releases.

sknot-rh commented 5 years ago

After the decision to postpone sticky sessions and discussion with @tombentley, Tom proposes to limit the bridge replicas to the min 0 and max 1. WDYT, @ppatierno @scholzj ?

scholzj commented 5 years ago

IMHO there might be use cases like producing telemetry from IoT devices which can use scaling and do not suffer from any problems with the consumer scaling. So I'm not sure we should do it. WDYT @ppatierno? You are the IoT expert ;-)

tombentley commented 5 years ago

True, producers are not a problem, so I guess we shouldn't prohibit the bridge from scaling up.

ppatierno commented 5 years ago

Yeah luckily producing is not a problem for the bridge and it is a good use case for IoT telemetry as Jakub pointed out. BTW if the same bridge deployment with more replicas is used for consuming as well we start to have the well known problem. Tom was suggesting to allow a sort of enabling/disabling separately producing and consuming part of the bridge which could be useful in this use case.

tombentley commented 5 years ago

Tom was suggesting to allow a sort of enabling/disabling separately producing and consuming part of the bridge which could be useful in this use case.

To elaborate: I see that as a simple extra layer of security: If you can deploy a bridge in produce-only mode (possibly with "topic mask" of the topics which can be produced to, that's a simple extra line of defence in addition to Kafka ACLs. That idea is orthogonal to the scaling problem.

scholzj commented 5 years ago

I do not think there is time for that and it seems as kind of strange thing to me. But there is an easy solution - the bridge should be normally authenticating against the broker and you can limit producing in the ACLs.

tombentley commented 5 years ago

I didn't say this was something we should do, or that we should try to do right now. It was an idea. I explicitly said it was an extra line of defence in addition to ACLs. I was merely trying to put Paolo's comment in context.

scholzj commented 5 years ago

Thsi should be already addressed and the issue can be closed.