kyma-project / kyma

Kyma is an opinionated set of Kubernetes-based modular building blocks, including all necessary capabilities to develop and run enterprise-grade cloud-native applications.
https://kyma-project.io
Apache License 2.0
1.52k stars 405 forks source link

MigrateFromEventMeshUpgradeTest test is failing on upgrade jobs. #7722

Closed Ressetkk closed 4 years ago

Ressetkk commented 4 years ago

Description The test MigrateFromEventMeshUpgradeTest is failing on the post-master-kyma-gke-upgrade jobs.

Fetched Log:

All attempts fail:\n#1: trigger migrate-eventmesh-upgrade not ready: {{1 [{Broker True {2020-03-20 10:50:58 +0000 UTC} } {BrokerReady True {2020-03-20 10:51:22 +0000 UTC} } {Dependency True {2020-03-20 10:41:38 +0000 UTC} } {DependencyReady True {2020-03-20 10:51:22 +0000 UTC} } {Ready False {2020-03-20 10:51:22 +0000 UTC} NotSubscribed object is being deleted: subscriptions.messaging.knative.dev \"default-migrate-eventmesh--d01c3304-947f-4cfc-9586-48a0a5a9fbee\" already exists} {Subscribed False {2020-03-20 10:51:22 +0000 UTC} NotSubscribed object is being deleted: subscriptions.messaging.knative.dev \"default-migrate-eventmesh--d01c3304-947f-4cfc-9586-48a0a5a9fbee\" already exists} {SubscriberResolved True {2020-03-20 10:51:22 +0000 UTC} }]} http://migrate-eventmesh-upgrade.migratefromeventmeshupgradetest.svc.cluster.local:9000/v3/events}

Affected job logs: https://status.build.kyma-project.io/view/gcs/kyma-prow-logs/logs/post-master-kyma-gke-upgrade/1240957520218951683 https://status.build.kyma-project.io/view/gcs/kyma-prow-logs/logs/post-master-kyma-gke-upgrade/1240956261105668096 https://status.build.kyma-project.io/view/gcs/kyma-prow-logs/logs/post-master-kyma-gke-upgrade/1240945691325370368

Ressetkk commented 4 years ago

Error log from Wednesday same issue: https://status.build.kyma-project.io/view/gcs/kyma-prow-logs/logs/post-master-kyma-gke-upgrade/1240227712983896064

sayanh commented 4 years ago

Created PR https://github.com/kyma-project/kyma/pull/7721 to understand why Trigger goes into unready state which is linked to its Subscription which is also unready. Commented out the other tests in the PR.

sayanh commented 4 years ago

Here's what I think is happening:

{TypeMeta:{Kind:Trigger APIVersion:eventing.knative.dev/v1alpha1} ObjectMeta:{Name:migrate-eventmesh-upgrade GenerateName: Namespace:migratefromeventmeshupgradetest SelfLink:/apis/eventing.knative.dev/v1alpha1/namespaces/migratefromeventmeshupgradetest/triggers/migrate-eventmesh-upgrade UID:d58edca9-2bb5-4f14-88f3-e0d8076b8de4 ResourceVersion:14661 Generation:1 CreationTimestamp:2020-03-25 11:23:52 +0000 UTC DeletionTimestamp:\u003cnil\u003e DeletionGracePeriodSeconds:\u003cnil\u003e Labels:map[eventing.knative.dev/broker:default function:migrate-eventmesh-upgrade] Annotations:map[eventing.knative.dev/creator:system:serviceaccount:e2e-upgrade-test:test-e2e-upgrade-test eventing.knative.dev/lastModifier:system:serviceaccount:e2e-upgrade-test:test-e2e-upgrade-test] OwnerReferences:[] Initializers:nil Finalizers:[] ClusterName: ManagedFields:[]} Spec:{Broker:default Filter:0xc00053b370 Subscriber:{Ref:nil URI:http://migrate-eventmesh-upgrade.migratefromeventmeshupgradetest.svc.cluster.local:9000/v3/events}} Status:{Status:{ObservedGeneration:1 Conditions:[{Type:Broker Status:True Severity: LastTransitionTime:{Inner:2020-03-25 11:29:24 +0000 UTC} Reason: Message:} {Type:BrokerReady Status:True Severity: LastTransitionTime:{Inner:2020-03-25 11:29:50 +0000 UTC} Reason: Message:} {Type:Dependency Status:True Severity: LastTransitionTime:{Inner:2020-03-25 11:23:52 +0000 UTC} Reason: Message:} {Type:DependencyReady Status:True Severity: LastTransitionTime:{Inner:2020-03-25 11:29:50 +0000 UTC} Reason: Message:} {**Type:Ready Status:False** Severity: LastTransitionTime:{Inner:2020-03-25 11:29:50 +0000 UTC} Reason:NotSubscribed Message:object is being deleted: subscriptions.messaging.knative.dev \"default-migrate-eventmesh--d58edca9-2bb5-4f14-88f3-e0d8076b8de4\" already exists} {Type:Subscribed Status:False Severity: LastTransitionTime:{Inner:2020-03-25 11:29:50 +0000 UTC} Reason:NotSubscribed Message:object is being deleted: subscriptions.messaging.knative.dev \"default-migrate-eventmesh--d58edca9-2bb5-4f14-88f3-e0d8076b8de4\" already exists} {Type:SubscriberResolved Status:True Severity: LastTransitionTime:{Inner:2020-03-25 11:29:50 +0000 UTC} Reason: Message:}]} SubscriberURI:http://migrate-eventmesh-upgrade.migratefromeventmeshupgradetest.svc.cluster.local:9000/v3/events}}
{TypeMeta:{Kind:Subscription APIVersion:messaging.knative.dev/v1alpha1} ObjectMeta:{Name:default-migrate-eventmesh--d58edca9-2bb5-4f14-88f3-e0d8076b8de4 GenerateName: Namespace:migratefromeventmeshupgradetest SelfLink:/apis/messaging.knative.dev/v1alpha1/namespaces/migratefromeventmeshupgradetest/subscriptions/default-migrate-eventmesh--d58edca9-2bb5-4f14-88f3-e0d8076b8de4 UID:4747db6d-d14c-4169-ba19-1b9b9e8f73dd ResourceVersion:14636 Generation:2 CreationTimestamp:2020-03-25 11:29:21 +0000 UTC DeletionTimestamp:2020-03-25 11:29:48 +0000 UTC DeletionGracePeriodSeconds:0xc000044150 Labels:map[eventing.knative.dev/broker:default eventing.knative.dev/trigger:migrate-eventmesh-upgrade] Annotations:map[] OwnerReferences:[{APIVersion:eventing.knative.dev/v1alpha1 Kind:Trigger Name:migrate-eventmesh-upgrade UID:d58edca9-2bb5-4f14-88f3-e0d8076b8de4 Controller:0xc0000441ba BlockOwnerDeletion:0xc0000441b9}] Initializers:nil Finalizers:[subscription-controller] ClusterName: ManagedFields:[]} Spec:{DeprecatedGeneration:0 Channel:{Kind:NatssChannel Namespace: Name:default-kne-trigger UID: APIVersion:messaging.knative.dev/v1alpha1 ResourceVersion: FieldPath:} Subscriber:0xc00095e340 Reply:0xc00095e330 Delivery:\u003cnil\u003e} Status:{Status:{ObservedGeneration:1 Conditions:[{Type:AddedToChannel Status:True Severity: LastTransitionTime:{Inner:2020-03-25 11:29:22 +0000 UTC} Reason: Message:} {Type:ChannelReady Status:True Severity: LastTransitionTime:{Inner:2020-03-25 11:29:22 +0000 UTC} Reason: Message:} {Type:Ready Status:True Severity: LastTransitionTime:{Inner:2020-03-25 11:29:22 +0000 UTC} Reason: Message:} {Type:Resolved Status:True Severity: LastTransitionTime:{Inner:2020-03-25 11:29:22 +0000 UTC} Reason: Message:}]} PhysicalSubscription:{SubscriberURI:http://default-broker-filter.migratefromeventmeshupgradetest.svc.cluster.local/triggers/migratefromeventmeshupgradetest/migrate-eventmesh-upgrade/d58edca9-2bb5-4f14-88f3-e0d8076b8de4 ReplyURI: DeadLetterSinkURI:}}}

Reason: Pre-upgrade hook of knative-eventing, deletes knative subscription. Deletion of CRs happened fine but the eventing-controller recreates the subscription(reconciled triggers) right away. This can be concluded from the timestamp: Trigger creation timestamp: 2020-03-25 11:23:52 +0000 UTC Knative subscription creation timestamp: 2020-03-25 11:29:21 +0000 UTC

Then the new subscription is also marked for deletion but never gets cleaned up due to the dangling finalizer, subscription-controller on the Knative subscription.

More details need to be dug out from the logs of the eventing-controller as in why https://github.com/knative/eventing/blob/release-0.12/pkg/reconciler/subscription/subscription.go#L169 failed or the finalizer was not removed.

anishj0shi commented 4 years ago

This issue, doesn't occurs at the moment, shall reopen this if it happens again.

sayanh commented 4 years ago

Troubleshooting guide

Events are not working after upgrade(from 1.10 to 1.11)

kubectl get triggers.eventing.knative.dev -A
NAMESPACE   NAME                                   READY   REASON                 BROKER    SUBSCRIBER_URI           AGE
test        654331ad-f6ab-56ab-bd91-e0b397afb7b8   False   NotSubscribed          default   http://test.test:8080/   39h
kubectl describe triggers.eventing.knative.dev -n <namespace> <trigger_name>
kubectl get subscriptions.messaging.knative.dev -n <namespace> -l  eventing.knative.dev/trigger=<trigger_name>
kubectl scale deploy -n knative-eventing eventing-controller --replicas=0
kubectl patch -n <namespace> subscriptions.messaging.knative.dev <subscription_name> --type merge -p '{"metadata": {"finalizers": []}}'
kubectl scale deploy -n knative-eventing eventing-controller --replicas=1
kubectl get triggers.eventing.knative.dev -n <namespace> <trigger_name>