Closed lhw closed 3 years ago
Deleting the loadbalancing service will not cause the endless loop to stop. Instead it will pop up a few more errors and continue:
2021-05-11 14:28:43,135 - INFO - [nitrointerface.py:_delete_nsapp_cs_vserver:1782] (MainThread) csvserver stg-intern-nginx-ingress_80_kube-system_svc is deleted successfully
2021-05-11 14:28:43,135 - INFO - [nitrointerface.py:delete_nsapp:3763] (MainThread) Deleting application: stg-intern-nginx-ingress_443_kube-system_svc LB Role: server
2021-05-11 14:28:43,203 - INFO - [nitrointerface.py:_unbind_default_cs_policy:3204] (MainThread) stg-intern-nginx-ingress_443_lbv_jzp5b6b4tfpli5qcnbgtl7v3zmqtzjzk lbvserver unbind from stg-intern-nginx-ingress_443_kube-system_svc csvserver is successful
2021-05-11 14:28:43,242 - INFO - [nitrointerface.py:_delete_nsapp_service_group:1488] (MainThread) servicegroup stg-intern-nginx-ingress_443_sgp_jzp5b6b4tfpli5qcnbgtl7v3zmqtzjzk is deleted successfully
2021-05-11 14:28:43,291 - INFO - [referencemanager.py:process_unmanaged_delete_event:1053] (MainThread) Deleting Unmanaged entity: kube-system.lbvserver.intern-nginx-ingress - stg-intern-nginx-ingress_443_lbv_jzp5b6b4tfpli5qcnbgtl7v3zmqtzjzk
2021-05-11 14:28:43,334 - INFO - [nitrointerface.py:_delete_nsapp_vserver:1258] (MainThread) LBvserver stg-intern-nginx-ingress_443_lbv_jzp5b6b4tfpli5qcnbgtl7v3zmqtzjzk is deleted successfully
2021-05-11 14:28:43,334 - INFO - [referencemanager.py:process_unmanaged_delete_event:1053] (MainThread) Deleting Unmanaged entity: kube-system.csvserver_lbsvc.intern-nginx-ingress - stg-intern-nginx-ingress_443_kube-system_svc
2021-05-11 14:28:43,425 - INFO - [nitrointerface.py:_delete_nsapp_cs_vserver:1782] (MainThread) csvserver stg-intern-nginx-ingress_443_kube-system_svc is deleted successfully
2021-05-11 14:28:43,436 - INFO - [clienthelper.py:get:49] (MainThread) Resource not found: /services/intern-nginx-ingress namespace kube-system
2021-05-11 14:28:43,436 - ERROR - [customresourcecontroller.py:event_handler:232] (MainThread) FAILURE: DELIVERING CRD event: Exception "local variable 'crd_name' referenced before assignment" while handling event for crd service-intern-nginx-ingress.kube-system of kind vip
2021-05-11 14:28:43,453 - INFO - [clienthelper.py:get:49] (MainThread) Resource not found: /endpoints/intern-nginx-ingress namespace kube-system
2021-05-11 14:28:43,453 - INFO - [kubernetes.py:get_endpoints_for_service:2541] (MainThread) Failed to get endpoints list for the app intern-nginx-ingress
2021-05-11 14:28:43,454 - INFO - [kubernetes.py:update_cpx_for_apps:4410] (MainThread) Handling Type LoadBalancer Service Modification intern-nginx-ingress.kube-system
2021-05-11 14:28:43,454 - INFO - [kubernetes.py:kubernetes_service_to_nsapps:2758] (MainThread) Handling Service creation/Modification intern-nginx-ingress.kube-system
2021-05-11 14:28:43,454 - INFO - [kubernetes.py:kubernetes_service_to_nsapps:2991] (MainThread) Configuring Type LoadBalancer Service intern-nginx-ingress:kube-system port params:{'name': 'http', 'protocol': 'tcp', 'port': 80, 'targetPort': 80, 'nodePort': 31219, 'vip': '172.31.203.231', 'com/class': 'intern', 'stylebook': None, 'sslcert': {}, 'range-name': None, 'stylebook_params': {}, 'stylebook_service_params': {}}
2021-05-11 14:28:43,454 - INFO - [kubernetes.py:kubernetes_service_to_nsapps:2997] (MainThread) Updating the LoadBalancer service kube-system:intern-nginx-ingress status with IP:172.31.203.231
2021-05-11 14:28:43,465 - INFO - [clienthelper.py:patch:73] (MainThread) Got status code 404, Resource not found: API: /services/intern-nginx-ingress/status namespace kube-system
2021-05-11 14:28:43,487 - INFO - [clienthelper.py:post:100] (MainThread) Got status code 409, Resource already exists request api: /vips namespace: kube-system, no action needed
2021-05-11 14:28:43,487 - INFO - [kubernetes.py:kubernetes_service_to_nsapps:2991] (MainThread) Configuring Type LoadBalancer Service intern-nginx-ingress:kube-system port params:{'name': 'https', 'protocol': 'tcp', 'port': 443, 'targetPort': 443, 'nodePort': 31755, 'vip': '172.31.203.231', 'com/class': 'intern', 'stylebook': None, 'sslcert': {}, 'range-name': None, 'stylebook_params': {}, 'stylebook_service_params': {}}
2021-05-11 14:28:43,488 - INFO - [kubernetes.py:kubernetes_service_to_nsapps:2997] (MainThread) Updating the LoadBalancer service kube-system:intern-nginx-ingress status with IP:172.31.203.231
2021-05-11 14:28:43,497 - INFO - [clienthelper.py:patch:73] (MainThread) Got status code 404, Resource not found: API: /services/intern-nginx-ingress/status namespace kube-system
2021-05-11 14:28:43,515 - INFO - [clienthelper.py:post:100] (MainThread) Got status code 409, Resource already exists request api: /vips namespace: kube-system, no action needed
2021-05-11 14:28:43,516 - INFO - [nitrointerface.py:configure_ns_cs_app:3614] (MainThread) Configuring csvserver: stg-intern-nginx-ingress_80_kube-system_svc and associated services
2021-05-11 14:28:43,585 - INFO - [nitrointerface.py:_create_nsapp_cs_vserver:2725] (MainThread) csvserver stg-intern-nginx-ingress_80_kube-system_svc is created successfully
2021-05-11 14:28:43,585 - INFO - [referencemanager.py:process_unmanaged_add_event:1013] (MainThread) Adding unmanaged entity: kube-system.csvserver_lbsvc.intern-nginx-ingress - stg-intern-nginx-ingress_80_kube-system_svc
2021-05-11 14:28:43,586 - INFO - [nitrointerface.py:create_entities_for_policy:1834] (MainThread) Processing lbvserver:stg-intern-nginx-ingress_80_lbv_3ptbzysfiwgrnie2hic3wxvozioiloaq for csvserver:stg-intern-nginx-ingress_80_kube-system_svc service type for lbvserver: tcp service type for servicegroup:tcp
2021-05-11 14:28:43,648 - INFO - [nitrointerface.py:_create_nsapp_vserver:1237] (MainThread) lbvserver stg-intern-nginx-ingress_80_lbv_3ptbzysfiwgrnie2hic3wxvozioiloaq is created successfully
2021-05-11 14:28:43,767 - INFO - [nitrointerface.py:_bind_default_cs_policy:3231] (MainThread) csvserver stg-intern-nginx-ingress_80_kube-system_svc binding to lbvserver stg-intern-nginx-ingress_80_lbv_3ptbzysfiwgrnie2hic3wxvozioiloaq as default policy is successful
2021-05-11 14:28:43,839 - INFO - [nitrointerface.py:_create_nsapp_service_group:1452] (MainThread) Servicegroup stg-intern-nginx-ingress_80_sgp_3ptbzysfiwgrnie2hic3wxvozioiloaq is created successfully
2021-05-11 14:28:43,904 - INFO - [nitrointerface.py:_bind_service_group_lb:1536] (MainThread) servicegroup stg-intern-nginx-ingress_80_sgp_3ptbzysfiwgrnie2hic3wxvozioiloaq bind to lbvserver stg-intern-nginx-ingress_80_lbv_3ptbzysfiwgrnie2hic3wxvozioiloaq is successful
2021-05-11 14:28:43,965 - INFO - [nitrointerface.py:_configure_services_nondesired:1735] (MainThread) Binding 172.31.102.128:31219 from servicegroup stg-intern-nginx-ingress_80_sgp_3ptbzysfiwgrnie2hic3wxvozioiloaq is successful
2021-05-11 14:28:44,008 - INFO - [nitrointerface.py:_configure_services_nondesired:1735] (MainThread) Binding 172.31.102.125:31219 from servicegroup stg-intern-nginx-ingress_80_sgp_3ptbzysfiwgrnie2hic3wxvozioiloaq is successful
2021-05-11 14:28:44,009 - INFO - [referencemanager.py:process_unmanaged_add_event:1013] (MainThread) Adding unmanaged entity: kube-system.lbvserver.intern-nginx-ingress - stg-intern-nginx-ingress_80_lbv_3ptbzysfiwgrnie2hic3wxvozioiloaq
2021-05-11 14:28:44,009 - INFO - [nitrointerface.py:configure_ns_cs_app:3654] (MainThread) Finished processing instruction to configure stg-intern-nginx-ingress_80_kube-system_svc app associated with stg-intern-nginx-ingress_80_kube-system_svc csvserver
@lhw We are trying to reproduce and find the root cause the issue. We will get back to you on this.
@lhw unfortunately we were not able to reproduce this issue. We would want some more details about this issue from you.
kubectl get vip --all-namespaces
``
I recreated the issue for you:
- Is it possible for you to share the IPAM logs during this duration.
Here the complete log for the time period: https://gist.github.com/lhw/3e998cda187c17ce8bd08ef3ebf1e09d under cic.log. Luckily only around 1900 lines for the minute.
- Is this the complete CIC log during the given timeframe or is it filtered?
It wasn't filtered. The gist link also includes the cic-ipam.log
- If you still have the VIP CRD resource in your cluster can that be shared.
The gist link also contains the vip yaml that i was able to grasp before it was deleted again.
@lhw Just clarifying a few things:
- Is there only one instance of Citrix Ingress Controller and IPAM running in the cluster?
The issue is present on two clusters. But all clusters we have do have more than one ingress controller. The one from the log has three: The helm values for all three are here: https://gist.github.com/lhw/5a2c52260620ef6f4106b4a7f75417cb each has its own service-class though.
- Highly unlikely, but is there a daemon process or workload monitor which might be deleting the VIP resources created by the ingress controller in the kube-system namespace?
Only cic roles have access to the vips. So nothing else can touch the resource. But no. There is no additional tool interacting with it.
As you were pointing out the other cics. Here is the log of the other CICs from the same time period. https://gist.github.com/lhw/7b9f2e2a2b0992317c1f37155060f9e4
They seem to be reacting to the service even though the service class does not match their supplied value.
After disabling the ipam feature on both the extern
and extern-fbt
cics it works now. So it looks like the cics seem to be ignoring the service-class.
@lhw Thanks a lot for all the information. we know the root cause now. This will be fixed in the next release.
Describe the bug An address allocated for a loadbalancer will be created and deleted dozens of times a second by the interaction between the ipam and ingress controller
To Reproduce
Expected behavior
And I almost consider the last point optional at this point
Logs kubectl logs: https://gist.github.com/lhw/76ef70823251bea2db202d51de951f07
Kubernetes service:
Additional context Add any other context about the problem here.