istio / ztunnel

The `ztunnel` component of ambient mesh
Apache License 2.0
262 stars 92 forks source link

Ambient mode getting started fails due to proxy protocol in 1.22.1 #1124

Open feynmanliang opened 1 month ago

feynmanliang commented 1 month ago

When i follow https://istio.io/latest/docs/ambient/getting-started/, i expect the first kubectl exec deploy/sleep -- curl -s "http://$GATEWAY_HOST/productpage" | grep -o "<title>.*</title>" command to succeed.

Instead, the command fails with a 503 and the logs on the productpage-v1 pod complain about a proxy protocol connection. Reverting to 1.21.0 fixes the issue and I can follow the guide as expected.

I tracked this down to https://github.com/istio/ztunnel/pull/850 - is there something I should be doing to undo the proxy protocol encapsulation on the upstream end of a HBONE connected?

howardjohn commented 1 month ago

The PROXY stuff should not happen at all unless you set ambient.istio.io/waypoint-inbound-binding on the GatewayClass which is not even documented anywhere. Do you have some sort of customization? What is the exact error?

feynmanliang commented 1 month ago

Thanks for taking a look - appreciate your help @howardjohn.

feynman@Luping-Home:~/code/istio-1.22.1/bin$ ./istioctl  version
client version: 1.22.1
control plane version: 1.22.1
data plane version: 1.22.1 (5 proxies)
feynman@Luping-Home:~/code/istio-1.22.1/bin$ kubectl version
Client Version: v1.28.4
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.1+k3s1
WARNING: version difference between client (1.28) and server (1.30) exceeds the supported minor version skew of +/-1

Where this is happening:

This is how I configure my mesh

---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: istiod
spec:
  interval: 5m
  chart:
    spec:
      chart: istiod
      version: "1.22.1"
      sourceRef:
        kind: HelmRepository
        name: istio
      interval: 1m
  values:
    profile: ambient
    pilot:
      env:
        PILOT_ENABLE_ALPHA_GATEWAY_API: "true"
      resources:
        requests:
          cpu: 70m
          memory: 163M
        limits:
          memory: 163M
    meshConfig:
      enablePrometheusMerge: false
      ingressControllerMode: "DEFAULT"
      protocolDetectionTimeout: 5s
      extensionProviders:
      - name: otel
        envoyOtelAls:
          service: opentelemetry-collector.o11y.svc.cluster.local
          port: 4317
      - name: otel-tracing
        opentelemetry:
          port: 4317
          service: opentelemetry-collector.o11y.svc.cluster.local
      - name: "oauth2-proxy"
        envoyExtAuthzHttp:
          service: "oauth2-proxy.blueteam.svc.cluster.local"
          port: "80"
          includeRequestHeadersInCheck: ["authorization", "cookie"] # headers sent to the oauth2-proxy in the check request.
          headersToUpstreamOnAllow: ["authorization", "path", "x-auth-request-user", "x-auth-request-email", "x-auth-request-access-token"] # headers sent to backend application when request is allowed.
          headersToDownstreamOnAllow: ["set-cookie"] # headers sent back to the client when request is allowed.
          headersToDownstreamOnDeny: ["content-type", "set-cookie"] # headers sent back to the client when request is denied.
    global:
      variant: distroless
      proxy:
        resources:
          requests:
            cpu: 30m
            memory: 132M
      proxy_init:
        image: istio-proxy
  dependsOn:
  - name: istio-base

My GatewayClass is the default from the getting started page:

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  creationTimestamp: "2024-06-07T21:08:45Z"
  generation: 1
  name: istio
  resourceVersion: "2319"
  uid: 84e88795-94b8-434e-a6c4-f694d2d01fbf
spec:
  controllerName: istio.io/gateway-controller
  description: The default Istio GatewayClass
status:
  conditions:
  - lastTransitionTime: "2024-06-07T21:08:45Z"
    message: Handled by Istio controller
    observedGeneration: 1
    reason: Accepted
    status: "True"
    type: Accepted
  supportedFeatures:
  - GRPCRoute
  - Gateway
  - GatewayHTTPListenerIsolation
  - GatewayPort8080
  - GatewayStaticAddresses
  - HTTPRoute
  - HTTPRouteBackendProtocolH2C
  - HTTPRouteBackendProtocolWebSocket
  - HTTPRouteBackendRequestHeaderModification
  - HTTPRouteBackendTimeout
  - HTTPRouteDestinationPortMatching
  - HTTPRouteHostRewrite
  - HTTPRouteMethodMatching
  - HTTPRouteParentRefPort
  - HTTPRoutePathRedirect
  - HTTPRoutePathRewrite
  - HTTPRoutePortRedirect
  - HTTPRouteQueryParamMatching
  - HTTPRouteRequestMirror
  - HTTPRouteRequestMultipleMirrors
  - HTTPRouteRequestTimeout
  - HTTPRouteResponseHeaderModification
  - HTTPRouteSchemeRedirect
  - Mesh
  - MeshClusterIPMatching
  - MeshConsumerRoute
  - ReferenceGrant
  - TLSRoute

That label isn't present in the generated Gateway either

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  annotations:
    gateway.istio.io/controller-version: "5"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"gateway.networking.k8s.io/v1beta1","kind":"Gateway","metadata":{"annotations":{},"name":"bookinfo-gateway","namespace":"default"},"spec":{"gatewayClassName":"istio","listeners":[{"allowedRoutes":{"namespaces":{"from":"Same"}},"name":"http","port":80,"protocol":"HTTP"}]}}
    networking.istio.io/service-type: ClusterIP
  creationTimestamp: "2024-06-07T21:38:07Z"
  generation: 1
  name: bookinfo-gateway
  namespace: default
  resourceVersion: "11027"
  uid: 4740f118-3baa-47a5-a32f-244f7de05b32
spec:
  gatewayClassName: istio
  listeners:
  - allowedRoutes:
      namespaces:
        from: Same
    name: http
    port: 80
    protocol: HTTP
status:
  addresses:
  - type: Hostname
    value: bookinfo-gateway-istio.default.svc.cluster.local
  conditions:
  - lastTransitionTime: "2024-06-07T21:38:07Z"
    message: Resource accepted
    observedGeneration: 1
    reason: Accepted
    status: "True"
    type: Accepted
  - lastTransitionTime: "2024-06-07T21:38:37Z"
    message: Resource programmed, assigned to service(s) bookinfo-gateway-istio.default.svc.cluster.local:80
    observedGeneration: 1
    reason: Programmed
    status: "True"
    type: Programmed
  listeners:
  - attachedRoutes: 1
    conditions:
    - lastTransitionTime: "2024-06-07T21:38:07Z"
      message: No errors found
      observedGeneration: 1
      reason: Accepted
      status: "True"
      type: Accepted
    - lastTransitionTime: "2024-06-07T21:38:07Z"
      message: No errors found
      observedGeneration: 1
      reason: NoConflicts
      status: "False"
      type: Conflicted
    - lastTransitionTime: "2024-06-07T21:38:07Z"
      message: No errors found
      observedGeneration: 1
      reason: Programmed
      status: "True"
      type: Programmed
    - lastTransitionTime: "2024-06-07T21:38:07Z"
      message: No errors found
      observedGeneration: 1
      reason: ResolvedRefs
      status: "True"
      type: ResolvedRefs
    name: http
    supportedKinds:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
    - group: gateway.networking.k8s.io
      kind: GRPCRoute

Relevant logs:

On ztunnel

│ 2024-06-07T21:46:01.380119Z    info    access    connection complete    src.addr=10.42.0.65:56178 src.workload=bookinfo-gateway-istio-6d758c5f59-hmjgj s │
│ rc.namespace=default src.identity="spiffe://cluster.local/ns/default/sa/bookinfo-gateway-istio" dst.addr=10.42.0.59:9080 dst.hbone_addr=10.42.0.59:9080  │
│ dst.service=productpage-v1.default.svc.cluster.local dst.workload=productpage-v1-78787b7cdd-4j55j dst.namespace=default dst.identity="spiffe://cluster.l │
│ ocal/ns/default/sa/bookinfo-productpage" direction="inbound" bytes_sent=358 bytes_recv=1401 duration="11ms"                                              │
│ 2024-06-07T21:46:01.380483Z    info    access    connection complete    src.addr=10.42.0.62:53622 src.workload=sleep-5577c64d7c-bd2ng src.namespace=defa │
│ ult dst.addr=10.42.0.65:80 dst.service=bookinfo-gateway-istio.default.svc.cluster.local dst.workload=bookinfo-gateway-istio-6d758c5f59-hmjgj dst.namespa │
│ ce=default direction="outbound" bytes_sent=104 bytes_recv=219 duration="22ms"                                                                            │

No logs on bookinfo-gateway

On productpage-v1

│ ERROR:werkzeug:::ffff:10.42.0.65 - - [07/Jun/2024 21:46:01] code 400, message Bad request version ('80')                                                 │
│ INFO:werkzeug:::ffff:10.42.0.65 - - [07/Jun/2024 21:46:01] "PROXY TCP4 10.42.0.62 10.42.0.65 33169 80" 400 -                                             │
howardjohn commented 1 month ago

Thanks for the info

Can you show the DR config and flow of traffic? Do you have a config doing PROXY from bookinfo-gateway to productpage-v1? if so, why? What was terminating the PROXY before (or was there no 'before'?)

feynmanliang commented 1 month ago

There was no before - I am using the bookinfo app as a mutual repro.

All DRs

feynman@Luping-Home:~$ kubectl get destinationrule -A -o yaml
apiVersion: v1
items:
- apiVersion: networking.istio.io/v1
  kind: DestinationRule
  metadata:
    creationTimestamp: "2024-06-07T21:31:23Z"
    generation: 1
    labels:
      kustomize.toolkit.fluxcd.io/name: XXXX
      kustomize.toolkit.fluxcd.io/namespace: flux-system
    name: NOT_PRODUCTPAGEV1-proxy-protocol
    namespace: XXXX
    resourceVersion: "8133"
    uid: d7dc2555-7f53-4c48-a4d3-1d30e63ff117
  spec:
    host: NOT.PRODUCTPAGEV1.svc.cluster.local
    trafficPolicy:
      proxyProtocol:
        version: V1
kind: List
metadata:
  resourceVersion: ""

This rule is intended to target a separate service that does terminate proxy protocol.

and flow of traffic?

Is there a command I should run for this?

howardjohn commented 1 month ago

ohh I get it now. Totally misunderstood. Thanks, that certainly looks odd. The dr shouldn't apply since it doesn't match the host like you said

feynmanliang commented 3 weeks ago

Could this change be related https://github.com/istio/istio/blame/master/pilot/pkg/networking/core/cluster_traffic_policy.go#L57

It's present in the broken 1.22.1 release (https://github.com/istio/istio/commit/b531f2081d99835bba85fe947b7eef06a55aeb0f) but absent in the working 1.21.3 release we've rolled back to

howardjohn commented 3 weeks ago

BTW, setting pilot memory limit to 163M is a recipe for a cluster outage by OOMkilling for no reason. That is not very much memory for istiod and can easily spike up

howardjohn commented 3 weeks ago

I cannot reproduce it. my steps:

istioctl install --set profile=ambient
kubectl apply -f tests/integration/pilot/testdata/gateway-api-crd.yaml
kubectllabel ns defaullt istio.io/dataplane-mode=ambient istio-injection- --overwrite
kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml
kubectl apply -f samples/bookinfo/platform/kube/bookinfo-versions.yaml
cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: random-proxy-protocol
  namespace: other
spec:
  host: random.other.svc.cluster.local
  trafficPolicy:
    proxyProtocol:
      version: V1
EOF

Curl of productpage works fine, no PROXY.

Can you give steps to reproduce?

feynmanliang commented 3 weeks ago

Sure thing, will give it a shot asap.

On Tue, Jun 11, 2024 at 12:10 PM John Howard @.***> wrote:

I cannot reproduce it. my steps:

istioctl install --set profile=ambient kubectl apply -f tests/integration/pilot/testdata/gateway-api-crd.yaml kubectllabel ns defaullt istio.io/dataplane-mode=ambient istio-injection- --overwrite kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml kubectl apply -f samples/bookinfo/platform/kube/bookinfo-versions.yaml cat <<EOF | kubectl apply -f - apiVersion: networking.istio.io/v1 kind: DestinationRule metadata: name: random-proxy-protocol namespace: other spec: host: random.other.svc.cluster.local trafficPolicy: proxyProtocol: version: V1 EOF

Curl of productpage works fine, no PROXY.

Can you give steps to reproduce?

— Reply to this email directly, view it on GitHub https://github.com/istio/ztunnel/issues/1124#issuecomment-2161436984, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHRW5NE2LEIJOV6TZOPCOLZG5DTNAVCNFSM6AAAAABI7FRBXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRRGQZTMOJYGQ . You are receiving this because you authored the thread.Message ID: @.***>