knative-extensions / eventing-kafka

Kafka integrations with Knative Eventing.
Apache License 2.0
77 stars 83 forks source link

Kafka Source still shown as ready when topic is deleted #760

Open steven0711dong opened 3 years ago

steven0711dong commented 3 years ago

Describe the bug When user deletes the topic, Kafkasource status CR shows error but overall status is still shown as ready

Expected behavior We should have the overall ready status show the precise error

To Reproduce Create a Kafka source and make sure it is ready and receiving events and then delete the topic

Knative release version

Additional context Add any other context about the problem here such as proposed priority

lionelvillard commented 3 years ago

@devguyio @matzew @travis-minke-sap how do you handle this scenario in the channel implementations? I wonder if there is something we could leverage.

travis-minke-sap commented 3 years ago

Yeah, interesting...

We (distributed channel) generally haven't done anything special to detect / correct such external manual deletion of Kafka Topics. We have just assumed that the KafkaChannel CRD is the "owner" of the Topic, and that users are expected not to mess with them out-of-band.

Without trying it out... I would assume that the Receiver / Dispatcher would start logging errors and that if the controller restarted it would recreate the Topic. Not sure of the Status in the interim but it probably isn't handled accurately.

ntx-ben commented 3 years ago

I've also noticed that deleting a KafkaChannel results in the eventing-kafka-channel-controller crashing (using v0.24.1):

{"level":"info","ts":"2021-07-16T21:35:28.537Z","logger":"eventing-kafka-channel-controller","caller":"kafkachannel/dispatcher.go:110","msg":"Successfully Finalized Dispatcher Deployment","knative.dev/pod":"eventing-kafka-channel-controller-84d8fd46d7-spxf4","knative.dev/controller":"knative.dev.eventing-kafka.pkg.channel.distributed.controller.kafkachannel.Reconciler","knative.dev/kind":"messaging.knative.dev.KafkaChannel","knative.dev/traceid":"fac8ea5e-4d12-4877-86b7-09df575bcc75","knative.dev/key":"default/my-kafka-channel","Channel":"default/my-kafka-channel"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x1a6fcb3]
goroutine 148 [running[]:
knative.dev/eventing-kafka/pkg/channel/distributed/controller/kafkachannel.(*Reconciler).deleteTopic(0xc000639400, 0x22b1d98, 0xc000cf7020, 0xc0002681b0, 0x18, 0xc0002681b0, 0x18)
    knative.dev/eventing-kafka/pkg/channel/distributed/controller/kafkachannel/topic.go:137 +0x93
knative.dev/eventing-kafka/pkg/channel/distributed/controller/kafkachannel.(*Reconciler).finalizeKafkaTopic(0xc000639400, 0x22b1d98, 0xc000cf7020, 0xc0001bb040, 0x0, 0x0)
    knative.dev/eventing-kafka/pkg/channel/distributed/controller/kafkachannel/topic.go:82 +0x31e
knative.dev/eventing-kafka/pkg/channel/distributed/controller/kafkachannel.(*Reconciler).FinalizeKind(0xc000639400, 0x22b1d98, 0xc000cf7020, 0xc0001bb040, 0x0, 0x0)
    knative.dev/eventing-kafka/pkg/channel/distributed/controller/kafkachannel/reconciler.go:177 +0x645
knative.dev/eventing-kafka/pkg/client/injection/reconciler/messaging/v1beta1/kafkachannel.(*reconcilerImpl).Reconcile(0xc000639540, 0x22b1d98, 0xc000cf6ed0, 0xc000268108, 0x18, 0xc00057e2f8, 0x22b1d98)
    knative.dev/eventing-kafka/pkg/client/injection/reconciler/messaging/v1beta1/kafkachannel/reconciler.go:259 +0x1011
knative.dev/pkg/controller.(*Impl).processNextWorkItem(0xc00073a600, 0xc00059f700)
    knative.dev/pkg@v0.0.0-20210622173328-dd0db4b05c80/controller/controller.go:531 +0x5e4
knative.dev/pkg/controller.(*Impl).RunContext.func3(0xc000262020, 0xc00073a600)
    knative.dev/pkg@v0.0.0-20210622173328-dd0db4b05c80/controller/controller.go:468 +0x53
created by knative.dev/pkg/controller.(*Impl).RunContext
    knative.dev/pkg@v0.0.0-20210622173328-dd0db4b05c80/controller/controller.go:466 +0x1a5
travis-minke-sap commented 3 years ago

I've also noticed that deleting a KafkaChannel results in the eventing-kafka-channel-controller crashing (using v0.24.1):

I made a very quick attempt at reproducing this using Strimzi but didn't see the same behavior. If I understand correctly the error is caused by deleting a KafkaChannel whose backing Kafka Topic has already been deleted? Generally the logic should handle this case as a no-op (meaning... the topic should be deleted and it doesn't exist so there's nothing to do).

The actual failure above is most likely that the Reconciler's Kafka AdminClient is nil when it's trying to delete the Topic. The Reconciler re-creates the the AdminClient on every reconciliation loop (no re-use due to Sarama client timeout issues). Earlier in the logs do you see an error with "Failed To Create Kafka AdminClient" ? If so can you provide that info?

If you can reproduce the panic, we're definitely interested in understanding and fixing it - probably should create a separate Issue detailing the reproduction steps, etc... thanks!

ntzlqx commented 3 years ago

We encountered the same...I am able to replicate if I nuke the entire kafka cluster...the Admin Client then fails with the above

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

lionelvillard commented 2 years ago

/remove-lifecycle stale

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

lionelvillard commented 2 years ago

/remove-lifecycle stale /triage accepted