knative-extensions / eventing-kafka-broker

Alternate Kafka Broker implementation.
Apache License 2.0
184 stars 117 forks source link

Migration to new KafkaChannel does take into account authentication defaults from old implementation #2213

Closed mgencur closed 2 years ago

mgencur commented 2 years ago

Describe the bug The previous implementation of the channel did not require protocol and sasl.mechanism to be present in the Secret that was used for authentication. The protocol settings is not required otherwise the Broker will fail with this error:

reason: 'error getting cluster admin config: protocol required (key: protocol)
      supported protocols: [PLAINTEXT SASL_PLAINTEXT SSL SASL_SSL]'

The sasl.mechanism now defaults to PLAIN as implemented in https://github.com/knative-sandbox/eventing-kafka-broker/issues/2117 but if the previous implementation was using some other mechanism, the new default (PLAIN) will not work with the existing Kafka cluster. This effectively breaks the existing deployments and the KafkaChannel stops working during the upgrade.

Expected behavior The protocol and sasl.mechanism is properly migrated when migrating from old to new channel implementation.

To Reproduce Assuming a Strimzi-installed Kafka in kafka namespace with:

       - name: sasl
         port: 9094
         type: internal
         tls: true
         authentication:
           type: scram-sha-512

and a KafkaUser my-sasl-user user.

Note: If the secret was created like below the new implementation works as expected after upgrade:

oc create secret --namespace knative-eventing generic my-sasl-secret \
      --from-literal=ca.crt="$STRIMZI_CRT" \
      --from-literal=password="$SASL_PASSWD" \
      --from-literal=saslType="SCRAM-SHA-512" \
      --from-literal=user="my-sasl-user" \
      --from-literal=protocol="SASL_SSL" \  #This is added.
      --from-literal=sasl.mechanism="SCRAM-SHA-512" # This is added.

Knative release version 1.2.3

pierDipi commented 2 years ago

Thanks Martin!

/area channel /triage accepted /assign @aliok

aliok commented 2 years ago

@pierDipi @matzew @mgencur

The issue is that we use a different key in the secret now.

Old channel impl looks for saslType and the new one looks for protocol.

I don't want to touch user secrets during migration (sounds bad, but @mgencur to confirm) thus, I will make the code read old key in case the new key is missing.

mgencur commented 2 years ago

It would be saslType in the old impl. vs. sasl.mechanism in the new one, right? I guess it's alright to read saslType if it is present. But not sure what to do about protocol. It needs to be determined from the old configuration somehow (not sure where it's stored)

pierDipi commented 2 years ago

sounds good!

aliok commented 2 years ago

It would be saslType in the old impl. vs. sasl.mechanism in the new one, right?

Good point. I was mixing things.

mgencur commented 2 years ago

I guess the protocol can be determined from the other parameters: We can determine one of PLAINTEXT SASL_PLAINTEXT SSL SASL_SSL based on the presence of ca.crt and saslType, maybe?

pierDipi commented 2 years ago

yes, we can but I'd recommend deprecating the old format and using the new one for consistency.

aliok commented 2 years ago

It needs to be determined from the old configuration somehow (not sure where it's stored)

Old logic is messy. There are multiple places with overriding mechanisims to determine the protocol... There's also backwards compatibility logic for configmap format for pre-distributed-channel-era.

yes, we can but I'd recommend deprecating the old format and using the new one for consistency.

+1 on this, but we cannot do it right now...

I think the best is to have some logic (or even calling the old code in eventing-kafka) to build the auth config if protocol is missing. However, instead of carrying this over in channel code itself, I would like to do this in the migration code. However, that means I will touch the secret.

If we don't touch the secret, we will have the same problem: we will need to tell users to update their secret manually OR we carry the fallback logic forever.

pierDipi commented 2 years ago

Another option is to create/update a new secret (that uses the new format) in the system namespace based on the user-referenced secret (that is in the old format) and we use the new secret internally

In the reconciler we will have something like this:

oldSecret := getUserSecret()
newSecret := reconcileNewSecret(oldSecret)
// use `n` for contract, etc
pierDipi commented 2 years ago

The problem with touching external secrets is that they might be reconciled by a different system and so we will endup either fighting (if done in the reconciler) or in a broken state (if done in the migration job)

matzew commented 2 years ago

we use the new secret internally

I like the idea of using this internally, sound very elegant, and leaves as is, and we don't mess w/ 3rd parties.

Eventually, once we deprecate (let's RELEASE NOTE this change, that it is deprecated), we can remove the extra hop

aliok commented 2 years ago

https://github.com/knative-sandbox/eventing-kafka-broker/pull/2216

have a quick look? this touches the secret

aliok commented 2 years ago
Another option is to create/update a new secret (that uses the new format) in the system namespace based on the user-referenced secret (that is in the old format) and we use the new secret internally

The issue is we will carry this legacy logic until we ask manual interaction.

Well, anyway, I did the hard part in https://github.com/knative-sandbox/eventing-kafka-broker/pull/2216, which is determining what protocol the consolidated channel was using.

aliok commented 2 years ago

The problem with touching external secrets is that they might be reconciled by a different system and so we will endup either fighting (if done in the reconciler) or in a broken state (if done in the migration job)

Very good point.

Another option is to create/update a new secret (that uses the new format) in the system namespace based on the user-referenced secret (that is in the old format) and we use the new secret internally

Why do we need a new secret? Can't we use the logic here: https://github.com/knative-sandbox/eventing-kafka-broker/pull/2216/files#diff-607160422e079eb8a24ab3f7b7bc514584688dcbc2a6e877efc85dbf2c569e3eR270-R278 to determine the protocol on-the-fly?

As explained here I touched the secret in my PR because I didn't want to do have this logic in the channel code. I wanted this crap code to only live in the migration as a one-off thing.

Dataplane doesn't have to do anything with the secret too. So, IMO, there's maybe no point of creating the internal secret.

So, something like this:

protocol := secret.data['protocol']
if protocol == "":
   protocol = determineProtocolFromLegacy()

I am not 100% sure if we can delete config-kafka for a while. Because, in the legacy channel, it is also used to build the auth config. Not everything is in the secret and CR. Alternatively, instead of only picking good stuff from config-kafka to migrate into new configmap, we might need to pick some more things (Sarama config, etc perhaps).

@matzew @pierDipi ? Appreciate the feedback.

pierDipi commented 2 years ago

The problem with touching external secrets is that they might be reconciled by a different system and so we will endup either fighting (if done in the reconciler) or in a broken state (if done in the migration job)

Very good point.

Another option is to create/update a new secret (that uses the new format) in the system namespace based on the user-referenced secret (that is in the old format) and we use the new secret internally

Why do we need a new secret? Can't we use the logic here: https://github.com/knative-sandbox/eventing-kafka-broker/pull/2216/files#diff-607160422e079eb8a24ab3f7b7bc514584688dcbc2a6e877efc85dbf2c569e3eR270-R278 to determine the protocol on-the-fly?

As explained here I touched the secret in my PR because I didn't want to do have this logic in the channel code. I wanted this crap code to only live in the migration as a one-off thing.

Dataplane doesn't have to do anything with the secret too. So, IMO, there's maybe no point of creating the internal secret.

we have to read the secret in the data plane so we have these options:

  1. use the "protocol" using the new format by creating a new internal secret, or
  2. we implement your logic, that is implemented in Go to fall back to the old format, in Java as well, or
  3. use the multi secret ref feature (used by KafkaSource) in the contract for the channel

3 is the most flexible option and allows to specify keys for particular TLS or SASL settings (I'm not sure if it works in the channel case tho)

So, something like this:

protocol := secret.data['protocol']
if protocol == "":
   protocol = determineProtocolFromLegacy()

I am not 100% sure if we can delete config-kafka for a while. Because, in the legacy channel, it is also used to build the auth config. Not everything is in the secret and CR. Alternatively, instead of only picking good stuff from config-kafka to migrate into new configmap, we might need to pick some more things (Sarama config, etc perhaps).

sarama configs shouldn't be strictly needed in our case since we don't use sarama consumers and producers

@matzew @pierDipi ? Appreciate the feedback.

pierDipi commented 2 years ago

I don't see why option 3 shouldn't work, so I would investigate that one and we keep reading whatever user secret is provided + we need to determine the "protocol".

next, we will implement the new format for the channel. For now, we can focus on keeping the compatibility with the old format.

pierDipi commented 2 years ago

I am not 100% sure if we can delete config-kafka for a while. Because, in the legacy channel, it is also used to build the auth config. Not everything is in the secret and CR.

we can keep it around if we need it for this migration until we deprecate and remove the old format

matzew commented 2 years ago

I agree on trying the suggested option 3

pierDipi commented 2 years ago

For option 3.

Based on the content of the old format channel secret:

aliok commented 2 years ago

sarama configs shouldn't be strictly needed in our case since we don't use sarama consumers and producers

Yeah, but some of the TLS/SASL config was specified there directly. That's why I told we might need them too.

pierDipi commented 2 years ago

with https://github.com/knative-sandbox/eventing-kafka-broker/pull/2224 using this secret [1] will work.

[1]

  kubectl create secret --namespace "${SYSTEM_NAMESPACE}" generic strimzi-sasl-secret-legacy \
    --from-literal=ca.crt="$STRIMZI_CRT" \
    --from-literal=password="$SASL_PASSWD" \
    --from-literal=user="my-sasl-user" \
    --from-literal=saslType="SCRAM-SHA-512" \
    --dry-run=client -o yaml | kubectl apply -n "${SYSTEM_NAMESPACE}" -f -
aliok commented 2 years ago

/close

Fixed by https://github.com/knative-sandbox/eventing-kafka-broker/pull/2224

knative-prow[bot] commented 2 years ago

@aliok: Closing this issue.

In response to [this](https://github.com/knative-sandbox/eventing-kafka-broker/issues/2213#issuecomment-1140820977): >/close > >Fixed by https://github.com/knative-sandbox/eventing-kafka-broker/pull/2224 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.