istio-ecosystem / admiral

Admiral provides automatic configuration generation, syncing and service discovery for multicluster Istio service mesh
Apache License 2.0
584 stars 78 forks source link

[BUG] The GlobalTrafficPolicy doesn't failover when weights declared #134

Open wenxian opened 4 years ago

wenxian commented 4 years ago

Describe the bug If the weight is declared, the 10 times of consecutive5xxErrors won't failover to the other region

Steps To Reproduce

apiVersion: admiral.io/v1alpha1
kind: GlobalTrafficPolicy
metadata:
  name: gtp-admiral-sample
  namespace: sample-admiral
  labels:
    env: default
    identity: webapp-sample-admiral
spec:
  policy:
  - dns: default.webapp-sample-admiral.global
    lbType: 1 #0 represents TOPOLOGY, 1 represents FAILOVER
    target:
    - region: us-west-2
      weight: 10
    - region: us-east-1
      weight: 90

Expected behavior If a service returns 10 times 500, it won't get kicked off when GTP Weight(90 / 10 ) applied.

Without GTP, the failover will work with 10 consecutive 500 errors

aattuluri commented 4 years ago

@wenxian This could be an istio issue, but I remember this was tested at least in istio version 1.5.x i) What istio version are you using? ii) Can you paste the destination rule generated after applying this GTP?

wenxian commented 4 years ago

we are using istio 1.6

Namespace:    admiral-sync
Labels:       <none>
Annotations:  <none>
API Version:  networking.istio.io/v1beta1
Kind:         DestinationRule
Metadata:
  Creation Timestamp:  2020-08-10T21:22:01Z
  Generation:          10
  Resource Version:    170254904
  Self Link:           /apis/networking.istio.io/v1beta1/namespaces/admiral-sync/destinationrules/default.greeting-sample-showgtp.global-default-dr
  UID:                 83651bea-145a-4fdc-8efb-92601c695c76
Spec:
  Host:  default.greeting-sample-showgtp.global
  Traffic Policy:
    Load Balancer:
      Locality Lb Setting:
        Distribute:
          From:  us-east-1/*
          To:
            us-east-1:  99 #50
            us-west-2:  1 #50
      Simple:           ROUND_ROBIN
    Outlier Detection:
      Base Ejection Time:    120s
      consecutive5xxErrors:  10
      Interval:              5s
    Tls:
      Mode:  ISTIO_MUTUAL
Events:      <none>

I am calling from the us-east-1, actually, i found as long as the local (us-east-1) >= 50, the call always in local (us-east-1) which means 1. the weight doesn't got applied. (10 of 10 in east) 2. it won't fail over to remote (west).

aattuluri commented 4 years ago

@wenxian I see the destination rule has been generated with the correct weights as per the spec apparently the distribute sets weights. Outlier detection might not be used here.

Probably looking at the envoy clusters night help, can you share the output for the following command: istioctl proxy-config clusters <pod_name_of_source_workload> -o json

wenxian commented 4 years ago

us-east-1

  "name": "outbound|80||default.greeting-sample-showgtp.global",
        "type": "STRICT_DNS",
        "connectTimeout": "10s",
        "loadAssignment": {
            "clusterName": "outbound|80||default.greeting-sample-showgtp.global",
            "endpoints": [
                {
                    "locality": {
                        "region": "us-east-1"
                    },
                    "lbEndpoints": [
                        {
                            "endpoint": {
                                "address": {
                                    "socketAddress": {
                                        "address": "greeting.sample-showgtp.svc.cluster.local",
                                        "portValue": 80
                                    }
                                }
                            },
                            "loadBalancingWeight": 1
                        }
                    ],
                    "loadBalancingWeight": 50
                },
                {
                    "locality": {
                        "region": "us-west-2"
                    },
                    "lbEndpoints": [
                        {
                            "endpoint": {
                                "address": {
                                    "socketAddress": {
                                        "address": "a5020c7e4380642f09c42334f5d06314-b30f0b24ce995299.elb.us-west-2.amazonaws.com",
                                        "portValue": 15443
                                    }
                                }
                            },
                            "loadBalancingWeight": 1
                        }
                    ],
                    "loadBalancingWeight": 50
                }
            ]
        },
        "circuitBreakers": {
            "thresholds": [
                {
                    "maxConnections": 4294967295,
                    "maxPendingRequests": 4294967295,
                    "maxRequests": 4294967295,
                    "maxRetries": 4294967295
                }
            ]
        },
        "dnsRefreshRate": "5s",
        "respectDnsTtl": true,
        "dnsLookupFamily": "V4_ONLY",
        "outlierDetection": {
            "consecutive5xx": 10,
            "interval": "5s",
            "baseEjectionTime": "120s",
            "enforcingConsecutive5xx": 100
        },
        "commonLbConfig": {
            "healthyPanicThreshold": {},
            "localityWeightedLbConfig": {}
        },
        "transportSocket": {
            "name": "envoy.transport_sockets.tls",
            "typedConfig": {
                "@type": "type.googleapis.com/envoy.api.v2.auth.UpstreamTlsContext",
                "commonTlsContext": {
                    "tlsCertificateSdsSecretConfigs": [
                        {
                            "name": "default",
                            "sdsConfig": {
                                "apiConfigSource": {
                                    "apiType": "GRPC",
                                    "grpcServices": [
                                        {
                                            "envoyGrpc": {
                                                "clusterName": "sds-grpc"
                                            }
                                        }
                                    ]
                                }
                            }
                        }
                    ],
                    "combinedValidationContext": {
                        "defaultValidationContext": {},
                        "validationContextSdsSecretConfig": {
                            "name": "ROOTCA",
                            "sdsConfig": {
                                "apiConfigSource": {
                                    "apiType": "GRPC",
                                    "grpcServices": [
                                        {
                                            "envoyGrpc": {
                                                "clusterName": "sds-grpc"
                                            }
                                        }
                                    ]
--
                "sni": "outbound_.80_._.default.greeting-sample-showgtp.global"
            }
        },
        "metadata": {
            "filterMetadata": {
                "istio": {
                    "config": "/apis/networking.istio.io/v1alpha3/namespaces/admiral-sync/destination-rule/default.greeting-sample-showgtp.global-default-dr"
                }
            }
        },
        "filters": [
            {
                "name": "istio.metadata_exchange",
                "typedConfig": {
                    "@type": "type.googleapis.com/udpa.type.v1.TypedStruct",
                    "typeUrl": "type.googleapis.com/envoy.tcp.metadataexchange.config.MetadataExchange",
                    "value": {
                        "protocol": "istio-peer-exchange"
                    }
                }
            }
        ]
    },

I have a set up us-east-1 (admiral server and admiral remote) us-west-2 (admiral remote), actually i see the 50/50 distribute works in west but not in the east.

The east cluster goes to west (the LB), but looks like the west LB still returns the east response. So finally it looks like always in the east

us-west-2

 "name": "outbound|80||default.greeting-sample-showgtp.global",
        "type": "STRICT_DNS",
        "connectTimeout": "10s",
        "loadAssignment": {
            "clusterName": "outbound|80||default.greeting-sample-showgtp.global",
            "endpoints": [
                {
                    "locality": {
                        "region": "us-east-1"
                    },
                    "lbEndpoints": [
                        {
                            "endpoint": {
                                "address": {
                                    "socketAddress": {
                                        "address": "a4e692a23991b478ca62ea84881d79da-53c356a7441bc499.elb.us-east-1.amazonaws.com",
                                        "portValue": 15443
                                    }
                                }
                            },
                            "loadBalancingWeight": 1
                        }
                    ],
                    "loadBalancingWeight": 50
                },
                {
                    "locality": {
                        "region": "us-west-2"
                    },
                    "lbEndpoints": [
                        {
                            "endpoint": {
                                "address": {
                                    "socketAddress": {
                                        "address": "greeting.sample-showgtp.svc.cluster.local",
                                        "portValue": 80
                                    }
                                }
                            },
                            "loadBalancingWeight": 1
                        }
                    ],
                    "loadBalancingWeight": 50
                }
            ]
        },
        "circuitBreakers": {
            "thresholds": [
                {
                    "maxConnections": 4294967295,
                    "maxPendingRequests": 4294967295,
                    "maxRequests": 4294967295,
                    "maxRetries": 4294967295
                }
            ]
        },
        "dnsRefreshRate": "5s",
        "respectDnsTtl": true,
        "dnsLookupFamily": "V4_ONLY",
        "outlierDetection": {
            "consecutive5xx": 10,
            "interval": "5s",
            "baseEjectionTime": "120s",
            "enforcingConsecutive5xx": 100
        },
        "commonLbConfig": {
            "healthyPanicThreshold": {},
            "localityWeightedLbConfig": {}
        },
        "transportSocket": {
            "name": "envoy.transport_sockets.tls",
            "typedConfig": {
                "@type": "type.googleapis.com/envoy.api.v2.auth.UpstreamTlsContext",
                "commonTlsContext": {
                    "tlsCertificateSdsSecretConfigs": [
                        {
                            "name": "default",
                            "sdsConfig": {
                                "apiConfigSource": {
                                    "apiType": "GRPC",
                                    "grpcServices": [
                                        {
                                            "envoyGrpc": {
                                                "clusterName": "sds-grpc"
                                            }
                                        }
                                    ]
                                }
                            }
                        }
                    ],
                    "combinedValidationContext": {
                        "defaultValidationContext": {},
                        "validationContextSdsSecretConfig": {
                            "name": "ROOTCA",
                            "sdsConfig": {
                                "apiConfigSource": {
                                    "apiType": "GRPC",
                                    "grpcServices": [
                                        {
                                            "envoyGrpc": {
                                                "clusterName": "sds-grpc"
                                            }
                                        }
                                    ]
--
                "sni": "outbound_.80_._.default.greeting-sample-showgtp.global"
            }
        },
        "metadata": {
            "filterMetadata": {
                "istio": {
                    "config": "/apis/networking.istio.io/v1alpha3/namespaces/admiral-sync/destination-rule/default.greeting-sample-showgtp.global-default-dr"
                }
            }
        },
        "filters": [
            {
                "name": "istio.metadata_exchange",
                "typedConfig": {
                    "@type": "type.googleapis.com/udpa.type.v1.TypedStruct",
                    "typeUrl": "type.googleapis.com/envoy.tcp.metadataexchange.config.MetadataExchange",
                    "value": {
                        "protocol": "istio-peer-exchange"
                    }
                }
            }
        ]
    },

curl -HHost:default.greeting-sample-showgtp.global a5020c7e4380642f09c42334f5d06314-b30f0b24ce995299.elb.us-west-2.amazonaws.com always returns the east answer

--- UPDATE --- Found that if the west cluster has more weights, then the request from west will always be in west cluster.

(US-EAST-1 >= 50, US-WEST-2) -> Request from East will always return East, West is good (US-WEST-2> 50, US-EAST-1) -> Request from West will always return West, East is good

This means if the cluster (locality) has more weight, it could result in the requests from its own cluster fall in its cluster always. (Because the LB always resolves to its own cluster)