kumahq / kuma

🐻 The multi-zone service mesh for containers, Kubernetes and VMs. Built with Envoy. CNCF Sandbox Project.
https://kuma.io/install
Apache License 2.0
3.68k stars 335 forks source link

Kuma external service issue - 2.8.2 #11059

Closed brunda-bs closed 3 weeks ago

brunda-bs commented 3 months ago

What happened?

We are trying to upgrade Kuma from 2.5.2 to 2.8.2. We see an issue with external service connectivity post-upgrade. We have 500+ external services in our current setup.

Kuma control plane error logs: 2024-08-06T04:34:30.568Z ERROR vips-outbound-view there are two external services with the same 'networking.address' or two headless services, to disable automatic DNS generation in external services set 'networking.disableHostDNSEntry=true' in at least one of these externalService {"error": "autogenerated DNS entry cpmsignal-training.xyz-digital.net:443 from external-service:cpmsignal-training-xyz-digital-net-443-4vvvf74fbw6w4vz6 conflicts with existing entry external-service:cpmsignal-training-xyz-digital-net-443"}

Looks like the above error is due to the metadata name changes for the external service in zone tenants post-upgrade. Although the error disappears after 10-15 minutes, the application container is failing to establish connectivity to the external service as it is failing with a connection refused error or failed to resolve the external DNS error.

The external services that were created in 2.5 were having the metadata.name - cpmsignal-training-xyz-digital-net-443. After the upgrade global cp still using the same but somehow the Zone cp is recreating a local copy of the external service configuration with a different metadata.name. cpmsignal-training-xyz-digital-net-443-4vvvf74fbw6w4vz6

Global CP external service:

Name:         cpmsignalr-training-xyz-digital-net-443
Namespace:    
Labels:       kustomize.toolkit.fluxcd.io/name=mesh-external-services
              kustomize.toolkit.fluxcd.io/namespace=kuma-system
Annotations:  <none>
API Version:  kuma.io/v1alpha1
Kind:         ExternalService
Mesh:         default
Metadata:
  Creation Timestamp:  2024-06-17T10:46:07Z
  Generation:          1
  Owner References:
    API Version:     kuma.io/v1alpha1
    Kind:            Mesh
    Name:            default
    UID:             57869104-9403-4b96-afa7-01ea9fcccf7b
  Resource Version:  1696511361
  UID:               8fb7ec65-15c6-48aa-b2f8-adef04a3e49b
Spec:
  Networking:
    Address:  cpmsignalr-training.xyz-digital.net:443
    Tls:
      Enabled:  false
  Tags:
    kuma.io/protocol:  tcp
    kuma.io/service:   cpmsignalr-training-xyz-digital-net-443
Events:                <none>

Zone CP external service:

Name:         cpmsignalr-training-xyz-digital-net-443-4vvvf74fbw6w4vz6
Namespace:    
Labels:       kuma.io/origin=global
              kustomize.toolkit.fluxcd.io/name=mesh-external-services
              kustomize.toolkit.fluxcd.io/namespace=kuma-system
Annotations:  kuma.io/display-name: cpmsignalr-training-xyz-digital-net-443
API Version:  kuma.io/v1alpha1
Kind:         ExternalService
Mesh:         default
Metadata:
  Creation Timestamp:  2024-08-06T03:15:51Z
  Generation:          1
  Resource Version:    1366906165
  UID:                 c7ea5a83-6b85-4a53-ad6c-a51f9425441f
Spec:
  Networking:
    Address:  cpmsignalr-training.xyz-digital.net:443
    Tls:
  Tags:
    kuma.io/protocol:  tcp
    kuma.io/service:   cpmsignalr-training-xyz-digital-net-443
slonka commented 3 months ago

triage: our upgrade policy allows to upgrade 2 minor versions, so you should upgrade from 2.5.2 to 2.7.6 and then from 2.7.6 to 2.8.2. Also are you sure you upgraded global first?

lukidzi commented 3 months ago

@brunda-bs any news?

brunda-bs commented 3 months ago

Hi @lukidzi / @slonka

Yes, We did upgrade the global first.

bartsmykla commented 3 months ago

triage: did you also upgraded as suggested from 2.5.x to 2.7.x first and then from 2.7.x to 2.8.x?

lukidzi commented 2 months ago

I've tried to reproduce it but no luck: My steps:

  1. Install kuma 2.5.2 on global

    ./kumactl install control-plane \
    --set "controlPlane.mode=global" \
    | kubectl apply -f -
  2. Install kuma 2.5.2 on zone

    ./kumactl install control-plane \
    --set "controlPlane.mode=zone" \
    --set "controlPlane.zone=zone" \
    --set "ingress.enabled=true" --set "egress.enabled=true" \
    --set "controlPlane.kdsGlobalAddress=grpcs://<GLOBAL_IP>:5685" \
    --set "controlPlane.tls.kdsZoneClient.skipVerify=true" \
    | kubectl apply -f -
  3. Create external services

    apiVersion: kuma.io/v1alpha1
    kind: ExternalService
    mesh: default
    metadata:
    name: httbin
    spec:
    tags:
    kuma.io/service: httbin
    kuma.io/protocol: tcp
    networking:
    address: httpbin.org:443
    tls: # optional
      enabled: false
  4. Install demo-app kumactl install demo | kubectl apply -f -

  5. Check that there is only one externalservice kubectl get externalservices on zone

    NAME                                      AGE
    httpbin                                    10m
  6. Upgrade global control-plane to 2.8.2 (the same command as in 1)

  7. Upgrade zone control-plane to 2.8.2 (the same command as in 2 with global ip)

  8. Check logs and externalservices, there is only one externalservice with hash, no errors @brunda-bs could you check if the reproduction steps are correct? I couldn't reproduce it

slonka commented 1 month ago

@brunda-bs could you take a look at the reproduction steps that Lukasz posted?

lukidzi commented 1 month ago

@brunda-bs Hi, have you had a chance to take a look?

bartsmykla commented 3 weeks ago

triage: it's been more than a month without information