kumahq / kuma

🐻 The multi-zone service mesh for containers, Kubernetes and VMs. Built with Envoy. CNCF Sandbox Project.
https://kuma.io/install
Apache License 2.0
3.6k stars 332 forks source link

Kuma external service issue - 2.8.2 #11059

Open brunda-bs opened 1 month ago

brunda-bs commented 1 month ago

What happened?

We are trying to upgrade Kuma from 2.5.2 to 2.8.2. We see an issue with external service connectivity post-upgrade. We have 500+ external services in our current setup.

Kuma control plane error logs: 2024-08-06T04:34:30.568Z ERROR vips-outbound-view there are two external services with the same 'networking.address' or two headless services, to disable automatic DNS generation in external services set 'networking.disableHostDNSEntry=true' in at least one of these externalService {"error": "autogenerated DNS entry cpmsignal-training.xyz-digital.net:443 from external-service:cpmsignal-training-xyz-digital-net-443-4vvvf74fbw6w4vz6 conflicts with existing entry external-service:cpmsignal-training-xyz-digital-net-443"}

Looks like the above error is due to the metadata name changes for the external service in zone tenants post-upgrade. Although the error disappears after 10-15 minutes, the application container is failing to establish connectivity to the external service as it is failing with a connection refused error or failed to resolve the external DNS error.

The external services that were created in 2.5 were having the metadata.name - cpmsignal-training-xyz-digital-net-443. After the upgrade global cp still using the same but somehow the Zone cp is recreating a local copy of the external service configuration with a different metadata.name. cpmsignal-training-xyz-digital-net-443-4vvvf74fbw6w4vz6

Global CP external service:

Name:         cpmsignalr-training-xyz-digital-net-443
Namespace:    
Labels:       kustomize.toolkit.fluxcd.io/name=mesh-external-services
              kustomize.toolkit.fluxcd.io/namespace=kuma-system
Annotations:  <none>
API Version:  kuma.io/v1alpha1
Kind:         ExternalService
Mesh:         default
Metadata:
  Creation Timestamp:  2024-06-17T10:46:07Z
  Generation:          1
  Owner References:
    API Version:     kuma.io/v1alpha1
    Kind:            Mesh
    Name:            default
    UID:             57869104-9403-4b96-afa7-01ea9fcccf7b
  Resource Version:  1696511361
  UID:               8fb7ec65-15c6-48aa-b2f8-adef04a3e49b
Spec:
  Networking:
    Address:  cpmsignalr-training.xyz-digital.net:443
    Tls:
      Enabled:  false
  Tags:
    kuma.io/protocol:  tcp
    kuma.io/service:   cpmsignalr-training-xyz-digital-net-443
Events:                <none>

Zone CP external service:

Name:         cpmsignalr-training-xyz-digital-net-443-4vvvf74fbw6w4vz6
Namespace:    
Labels:       kuma.io/origin=global
              kustomize.toolkit.fluxcd.io/name=mesh-external-services
              kustomize.toolkit.fluxcd.io/namespace=kuma-system
Annotations:  kuma.io/display-name: cpmsignalr-training-xyz-digital-net-443
API Version:  kuma.io/v1alpha1
Kind:         ExternalService
Mesh:         default
Metadata:
  Creation Timestamp:  2024-08-06T03:15:51Z
  Generation:          1
  Resource Version:    1366906165
  UID:                 c7ea5a83-6b85-4a53-ad6c-a51f9425441f
Spec:
  Networking:
    Address:  cpmsignalr-training.xyz-digital.net:443
    Tls:
  Tags:
    kuma.io/protocol:  tcp
    kuma.io/service:   cpmsignalr-training-xyz-digital-net-443
slonka commented 1 month ago

triage: our upgrade policy allows to upgrade 2 minor versions, so you should upgrade from 2.5.2 to 2.7.6 and then from 2.7.6 to 2.8.2. Also are you sure you upgraded global first?

lukidzi commented 3 weeks ago

@brunda-bs any news?

brunda-bs commented 3 weeks ago

Hi @lukidzi / @slonka

Yes, We did upgrade the global first.

bartsmykla commented 2 weeks ago

triage: did you also upgraded as suggested from 2.5.x to 2.7.x first and then from 2.7.x to 2.8.x?