knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.57k stars 1.16k forks source link

Istio periodic tests are failing #15267

Closed skonto closed 5 months ago

skonto commented 5 months ago

See: https://testgrid.k8s.io/r/knative-own-testgrid/serving#istio-latest-no-mesh&show-stale-tests=

ReToCode commented 5 months ago

Hm I think this was introduced with https://github.com/knative/serving/commit/3b35f54c78787ed8b0903df2d9d46032c0bdafc6:

stream.go:305: E 01:35:56.089 activator-79648bc595-mp8f5 [activator] [serving-tests/b-y-o-certificate-glgpxzei-00001] Failed to probe clusterIP None:80 err=error roundtripping http://None:80/healthz: dial tcp: lookup None on 10.65.80.10:53: no such host

via https://github.com/knative/serving/blob/499dc1d51f5f802bc9e5f340977f0873e960e283/pkg/activator/net/revision_backends.go#L203

This does not seem to work for istio. Any idea how this is/was supposed to work? @skonto @dprotaso?

skonto commented 5 months ago

It is going though the clusterIP because it fails with pod probing (probing logic tries different options): " stream.go:305: W 10:34:42.072 activator-58748d654c-zz4bw [activator] [serving-tests/tag-header-based-routing-myoskcfv-00001] Failed probing pods err=unexpected status code: want [200], got 503 " Probably in the other tests that pod probing never failed. If we remove clusterIp from the private service then we should remove that probing option but I feel it was added for a reason.

ReToCode commented 5 months ago

@dprotaso @izabelacg I think that change has side-effects for meshes when direct pod adressability is not possible. We currently have this:

    # If true, networking plugins can add additional information to deployed
    # applications to make their pods directly accessible via their IPs even if mesh is
    # enabled and thus direct-addressability is usually not possible.
    # Consumers like Knative Serving can use this setting to adjust their behavior
    # accordingly, i.e. to drop fallback solutions for non-pod-addressable systems.
    #
    # NOTE: This flag is in an alpha state and is mostly here to enable internal testing
    #       for now. Use with caution.
    enable-mesh-pod-addressability: "false"

    # mesh-compatibility-mode indicates whether consumers of network plugins
    # should directly contact Pod IPs (most efficient), or should use the
    # Cluster IP (less efficient, needed when mesh is enabled unless
    # `enable-mesh-pod-addressability`, above, is set).
    # Permitted values are:
    #  - "auto" (default): automatically determine which mesh mode to use by trying Pod IP and falling back to Cluster IP as needed.
    #  - "enabled": always use Cluster IP and do not attempt to use Pod IPs.
    #  - "disabled": always use Pod IPs and do not fall back to Cluster IP on failure.
    mesh-compatibility-mode: "auto"

With this being on auto, the code tries the pod directly first, then falls back to using the ClusterIP. I'm still trying with istio, but with mesh enabled, this is now IMHO broken. I think we'd need to rework how this behaves before we can change the private service to being headless.

ReToCode commented 5 months ago

Tested locally, direct pod-addressability does not work with istio (mesh) + headless services. Seems there has been an issue about this which is closed but not really resolved: https://github.com/istio/istio/issues/7495.

dprotaso commented 5 months ago

@izabelacg can you revert https://github.com/knative/serving/pull/15170 for now

dprotaso commented 5 months ago

Tested locally, direct pod-addressability does not work with istio (mesh) + headless services. Seems there has been an issue about this which is closed but not really resolved: istio/istio#7495.

I believe istio ambient should work. I think also if net-istio were to create a destination rule it might work but i haven't dug into this.

skonto commented 5 months ago

It seems also that with strict mtls direct pod address-ability does not work, we fallback to clusterIP downstream. Not sure if we enable passthrough. This is for using the normal service (not headless).

dprotaso commented 5 months ago

Pod addressability in mesh issue is here - https://github.com/knative/serving/issues/10751

I have some notes from talking to an Istio maintainer will post there