kyma-project / kyma

Kyma is an opinionated set of Kubernetes-based modular building blocks, including all necessary capabilities to develop and run enterprise-grade cloud-native applications.
https://kyma-project.io
Apache License 2.0
1.51k stars 404 forks source link

kne trigger dispatcher deployment disappearing #9758

Closed p4p4 closed 3 years ago

p4p4 commented 3 years ago

Env:

Kyma version: 1.13.0 in Azure K8s Eveting config: Kafka based with Azure Event Hub Eventing source: CCv2

Description complete eventing breakdown last event was at 5:30 in the morning (see screenshot) And I see that there is no deployment for the kne-trigger-dispatcher (only for dispatchers) Also note that the dispatcher deployment resources were roughly re-created at that time

image

➜  ~ k -n knative-eventing get deployments
NAME                                         READY   UP-TO-DATE   AVAILABLE   AGE
eventing-controller                          1/1     1            1           129d
eventing-webhook                             1/1     1            1           129d
hybris-ccv2-d1-kyma-integration-dispatcher   1/1     1            1           8h
hybris-ccv2-p1-kyma-integration-dispatcher   1/1     1            1           8h
hybris-ccv2-s1-kyma-integration-dispatcher   1/1     1            1           8h
knative-eventing-kafka-channel-controller    1/1     1            1           91d
knative-kafka-channel                        1/1     1            1           44d
natss-ch-controller                          1/1     1            1           128d
natss-ch-dispatcher                          1/1     1            1           128d
sources-controller                           1/1     1            1           129d

Note that the integration-dispatcher deployments were re-created 8 hours ago and the linked kne-trigger-dispatchers are missing

we restarted those two

knative-eventing-kafka-channel-controller-6b7bb469b5-tx6j7 
knative-kafka-channel-7ff47b9b7-v7224

as they had issues connecting to azure eventhub.

{"level":"warn","ts":"2020-10-23T11:29:17.846Z","caller":"producer/producer.go:142","msg":"Kafka Error","error":"sasl_ssl://event-hubs-kyma-*********.servicebus.windows.net:9093/bootstrap: Connect to ipv4#13.69.64.14:9093 failed: Connection refused (after 100ms in state CONNECT)"

and then the kne-trigger-dispatchers were back again.

then events were flowing again
image

travis-minke-sap commented 3 years ago

Some initial thoughts after looking at logs and Grafana a bit...

travis-minke-sap commented 3 years ago

After some further investigation we have a possible explanation for the scenario described by this Issue. We are not able to conclusively state that this hypothesis is accurate, but it is the best explanation we've found that fits the description.

The deprecated knative-kafka implementation relies upon cross-namespace owner references, which is not supported by the Kubernetes specification. Despite this, K8S handles cascaded garbage collection for these resources successfully. It seems however, that there are scenarios in which the Kubernetes garbage collector can delete such resources. See the following issue for details... (specifically the comments on Feb 7 and May 4 which are similar to this scenario.)

https://github.com/kubernetes/kubernetes/issues/65200

We are theorizing that something similar occurred here, and the Kubernetes garbage collector deleted the Dispatcher's Deployment. It is still unclear why the Controller only re-reconciled the Dispatcher Deployments in the kyma-integration namespace, but not in the user namespaces (dgl-p1, etc).

Also, it should be noted that the broker-filter logs are still full of errors/retries which puts unnecessary load on the eventing infrastructure. The offending Subscriptions should be fixed or stopped to alleviate this burden.

And finally (fwiw)... I was able to verify that the KafkaChannel resources appear to NOT have been deleted/recreated during this downtime.

k15r commented 3 years ago

Workaround deliverd (updated channel controller with configurable timeout for forced reconcile)