knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.54k stars 1.15k forks source link

Unable to use Proxy Protocol in Knative service #14718

Open jdiazper opened 10 months ago

jdiazper commented 10 months ago

Ask your question here:

First of all thanks for the fantastic work you are doing with Knative!

I have been trying to enable proxy protocol so I can read the source IP address of the requester from the headers. When I configure Istio to enable proxy protocol following TCP/UDP Proxy Load Balancer in my Load Balancer and deploy the httpbin service and gateway as per section Before you begin everything is working fine. However when I try to access to my Knative Virtual Services I get upstream connect error or disconnect/reset before headers. reset reason: connection termination

Steps to reproduce this issue:

Preparation:

  1. Create file proxy-protocol.yaml:
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  meshConfig:
    accessLogFile: /dev/stdout
    defaultConfig:
      gatewayTopology:
        proxyProtocol: {}
  components:
    ingressGateways:
    - enabled: true
      name: istio-ingressgateway
      k8s:
        hpaSpec:
          maxReplicas: 1
          minReplicas: 1
        serviceAnnotations:
          service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*"
  1. Deploy Istio using istioctl
istioctl install -f proxy-protocol.yaml
  1. Install KNative Operator
kubectl apply -f https://github.com/knative/operator/releases/download/knative-v1.12.1/operator.yaml

Verify it has been successfully installed

kubectl get deployment knative-operator

output

NAME               READY   UP-TO-DATE   AVAILABLE   AGE
knative-operator   1/1     1            1           1m
  1. Install Knative Serving

Create knative-serving.yaml file

apiVersion: v1
kind: Namespace
metadata:
  name: knative-serving
---
apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  name: knative-serving
  namespace: knative-serving

Deploy knative-serving.yaml file

kubectl apply -f knative-serving.yaml

Install Knative net-istio

kubectl apply -f https://github.com/knative/net-istio/releases/download/knative-v1.12.0/net-istio.yaml

Verify the Knative Serving deployment

kubectl get deployment -n knative-serving

output

NAME                   READY   UP-TO-DATE   AVAILABLE   AGE
activator              1/1     1            1           18s
autoscaler             1/1     1            1           18s
autoscaler-hpa         1/1     1            1           14s
controller             1/1     1            1           18s
domain-mapping         1/1     1            1           12s
domainmapping-webhook  1/1     1            1           12s
webhook                1/1     1            1           17s

Check the status of Knative Serving Custom Resource

kubectl get KnativeServing knative-serving -n knative-serving
NAME              VERSION             READY   REASON
knative-serving   1.12.2              True
  1. Install Knative Eventing

Create file knative-eventing.yaml

apiVersion: v1
kind: Namespace
metadata:
  name: knative-eventing
---
apiVersion: operator.knative.dev/v1beta1
kind: KnativeEventing
metadata:
  name: knative-eventing
  namespace: knative-eventing

Deploy Knative Eventing

kubectl apply -f knative-eventing.yaml

Verify status of Knative Eventing

kubectl get deployment -n knative-eventing

output

NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
eventing-controller     1/1     1            1           109s
eventing-webhook        1/1     1            1           108s
imc-controller          1/1     1            1           104s
imc-dispatcher          1/1     1            1           104s
mt-broker-controller    1/1     1            1           102s
mt-broker-filter        1/1     1            1           103s
mt-broker-ingress       1/1     1            1           102s
pingsource-mt-adapter   0/0     0            0           109s

Check the status of Knative Eventing Custom Resource

kubectl get KnativeEventing knative-eventing -n knative-eventing
NAME               VERSION   READY   REASON
knative-eventing   1.12.1    True

Execution

  1. Test requester IP Address is captured by httpbin service
kubectl create ns foo
kubectl label namespace foo istio-injection=enabled
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/httpbin/httpbin.yaml -n foo
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/httpbin/httpbin-gateway.yaml -n foo

Output:

namespace/foo created
namespace/foo labeled
serviceaccount/httpbin created
service/httpbin created
deployment.apps/httpbin created
gateway.networking.istio.io/httpbin-gateway created
virtualservice.networking.istio.io/httpbin created

Validate the service works and return the requester IP address.

Get LoadBalancer DNS

kubectl -n istio-system get svc

output

NAME                    TYPE           CLUSTER-IP       EXTERNAL-IP                                                                  PORT(S)                                      AGE
istio-ingressgateway    LoadBalancer   172.20.100.198   a8ace71c642ba49b39f0f0bc25d9f37b-1902910119.eu-central-1.elb.amazonaws.com   15021:30426/TCP,80:30439/TCP,443:32439/TCP   3h2m
istiod                  ClusterIP      172.20.118.70    <none>                                                                       15010/TCP,15012/TCP,443/TCP,15014/TCP        3h3m
knative-local-gateway   ClusterIP      172.20.9.220     <none>                                                                       80/TCP                                       75m
curl http://a8ace71c642ba49b39f0f0bc25d9f37b-1902910119.eu-central-1.elb.amazonaws.com/headers

Output:

{
  "headers": {
    "Accept": "*/*", 
    "Host": "a8ace71c642ba49b39f0f0bc25d9f37b-1902910119.eu-central-1.elb.amazonaws.com", 
    "User-Agent": "curl/7.81.0", 
    "X-B3-Parentspanid": "4249c5844c849c94", 
    "X-B3-Sampled": "0", 
    "X-B3-Spanid": "62708cf2ea8b163f", 
    "X-B3-Traceid": "a09d3946b14b5e6f4249c5844c849c94", 
    "X-Envoy-Attempt-Count": "1", 
    "X-Envoy-External-Address": "XXX.XXX.XXX.XXX", 
    "X-Forwarded-Client-Cert": "By=spiffe://cluster.local/ns/foo/sa/httpbin;Hash=d882a4cea12e31a0f7510971fced251f37b4ee78459e560c5b00268ea4497473;Subject=\"\";URI=spiffe://cluster.local/ns/istio-system/sa/istio-ingressgateway-service-account"
  }
}

So, till now all seems to be working great, but I am still not using Knative.

  1. Prepare knative deployment files

namespace.yaml

apiVersion: v1
kind: Namespace
metadata:
  name: test-service
  labels:
    istio-injection: enabled

service.yaml

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: backend
  namespace: test-service
  labels:
    networking.knative.dev/visibility: "cluster-local"
    networking.knative.dev/disable-auto-tls: "true"
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/initial-scale: "1"
        autoscaling.knative.dev/min-scale: "1"
        autoscaling.knative.dev/max-scale: "10"
    spec:
      containers:
        - image: fake-registry.github.com/fakepath/backend:1.0.0
          env:
            - name: ENVIRONMENT
              value: test

virtualservice.yaml

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: test-service
  namespace: test-service
spec:
  # This is the gateway shared in knative service mesh.
  gateways:
  - knative-serving/knative-ingress-gateway
  # Set host to the domain name that you own.
  hosts:
  - a8ace71c642ba49b39f0f0bc25d9f37b-1902910119.eu-central-1.elb.amazonaws.com
  http:
  - match:
    - uri:
        prefix: "/"
    rewrite:
      authority: backend.test-service.svc.cluster.local
    route:
      - destination:
          host: knative-local-gateway.istio-system.svc.cluster.local
          port:
            number: 80
        weight: 100

Deploy Namespace

kubectl apply -f namespace.yaml

Deploy Service

kubectl apply -f service.yaml

Deploy Virtual Servie

kubectl apply -f virtualservice.yaml

Validate everything is deployed and working.

kubectl -n test-service get all

Output

NAME                                                       READY   STATUS    RESTARTS   AGE
pod/backend-00001-deployment-d85fffd57-hmcbm   3/3     Running   0          119m

NAME                                        TYPE           CLUSTER-IP      EXTERNAL-IP                                            PORT(S)                                              AGE
service/backend                 ExternalName   <none>          knative-local-gateway.istio-system.svc.cluster.local   80/TCP                                               119m
service/backend-00001           ClusterIP      172.20.73.173   <none>                                                 80/TCP,443/TCP                                       119m
service/backend-00001-private   ClusterIP      172.20.1.118    <none>                                                 80/TCP,443/TCP,9090/TCP,9091/TCP,8022/TCP,8012/TCP   119m

NAME                                                   READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/backend-00001-deployment   1/1     1            1           119m

NAME                                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/backend-00001-deployment-d85fffd57   1         1         1       119m

NAME                                            URL                                                     READY   REASON
route.serving.knative.dev/backend   http://backend.test-service.svc.cluster.local   True    

NAME                                              URL                                                     LATESTCREATED               LATESTREADY                 READY   REASON
service.serving.knative.dev/backend   http://backend.test-service.svc.cluster.local   backend-00001   backend-00001   True    

NAME                                                     CONFIG NAME           K8S SERVICE NAME   GENERATION   READY   REASON   ACTUAL REPLICAS   DESIRED REPLICAS
revision.serving.knative.dev/backend-00001   backend                      1            True             1                 1

NAME                                                    LATESTCREATED               LATESTREADY                 READY   REASON
configuration.serving.knative.dev/backend   backend-00001   backend-00001   True  

Testing the knative service

curl http://a8ace71c642ba49b39f0f0bc25d9f37b-1902910119.eu-central-1.elb.amazonaws.com/

output

upstream connect error or disconnect/reset before headers. reset reason: connection termination

Therefore I have look at the istio-proxy logs in the istio-ingressgateway service.

kubectl -n istio-system logs -f -c istio-proxy service/istio-ingressgateway

The log for that specific request is

{"response_flags":"UC","user_agent":"curl/7.81.0","upstream_service_time":null,"upstream_local_address":"10.99.133.213:58216","route_name":null,"requested_server_name":null,"upstream_cluster":"outbound|80||knative-local-gateway.istio-system.svc.cluster.local","response_code":503,"protocol":"HTTP/1.1","upstream_host":"10.99.133.213:8081","response_code_details":"upstream_reset_before_response_started{connection_termination}","upstream_transport_failure_reason":null,"downstream_local_address":"10.0.48.195:80","duration":0,"path":"/","start_time":"2023-12-07T16:12:58.761Z","request_id":"11b6caf3-2d55-402a-b53f-2d05079d5f32","method":"GET","authority":"backend.test-service.svc.cluster.local","connection_termination_details":null,"downstream_remote_address":"XXX.XXX.XXX.XXX:18899","bytes_sent":95,"x_forwarded_for":"XXX.XXX.XXX.XXX","bytes_received":0}

So doing some research in the knative/networking repo issues I saw this so I decided to test the EnvoyFilter too.

EnvoyFilter.yaml

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: proxy-protocol
  namespace: istio-system
spec:
  workloadSelector:
    labels:
      istio: ingressgateway
  configPatches:
  - applyTo: LISTENER
    patch:
      operation: MERGE
      value:
        listener_filters:
        - name: envoy.filters.listener.proxy_protocol
        - name: envoy.filters.listener.tls_inspector

Apply the filter

kubectl apply -f EnvoyFilter.yaml

Then I restarted pods in istio-system and test-service namespaces.

kubectl -n istio-system delete pods --all
kubectl -n test-service delete pods --all

However the istio-ingressgateway pod never becomes healthy. So looking at the logs I can see:

2023-12-07T16:37:58.805990Z     info    FLAG: --concurrency="0"
2023-12-07T16:37:58.806037Z     info    FLAG: --domain="istio-system.svc.cluster.local"
2023-12-07T16:37:58.806043Z     info    FLAG: --help="false"
2023-12-07T16:37:58.806046Z     info    FLAG: --log_as_json="false"
2023-12-07T16:37:58.806049Z     info    FLAG: --log_caller=""
2023-12-07T16:37:58.806132Z     info    FLAG: --log_output_level="default:info"
2023-12-07T16:37:58.806139Z     info    FLAG: --log_rotate=""
2023-12-07T16:37:58.806142Z     info    FLAG: --log_rotate_max_age="30"
2023-12-07T16:37:58.806145Z     info    FLAG: --log_rotate_max_backups="1000"
2023-12-07T16:37:58.806158Z     info    FLAG: --log_rotate_max_size="104857600"
2023-12-07T16:37:58.806161Z     info    FLAG: --log_stacktrace_level="default:none"
2023-12-07T16:37:58.806169Z     info    FLAG: --log_target="[stdout]"
2023-12-07T16:37:58.806173Z     info    FLAG: --meshConfig="./etc/istio/config/mesh"
2023-12-07T16:37:58.806176Z     info    FLAG: --outlierLogPath=""
2023-12-07T16:37:58.806180Z     info    FLAG: --profiling="true"
2023-12-07T16:37:58.806183Z     info    FLAG: --proxyComponentLogLevel="misc:error"
2023-12-07T16:37:58.806190Z     info    FLAG: --proxyLogLevel="warning"
2023-12-07T16:37:58.806193Z     info    FLAG: --serviceCluster="istio-proxy"
2023-12-07T16:37:58.806197Z     info    FLAG: --stsPort="0"
2023-12-07T16:37:58.806200Z     info    FLAG: --templateFile=""
2023-12-07T16:37:58.806203Z     info    FLAG: --tokenManagerPlugin="GoogleTokenExchange"
2023-12-07T16:37:58.806207Z     info    FLAG: --vklog="0"
2023-12-07T16:37:58.806211Z     info    Version 1.20.0-6869a6c2371e21c847d216065cf5c59863d01b4c-Clean
2023-12-07T16:37:58.809356Z     info    Maximum file descriptors (ulimit -n): 1048576
2023-12-07T16:37:58.809844Z     info    Proxy role      ips=[10.99.124.88] type=router id=istio-ingressgateway-8fccfb4b9-f5t9q.istio-system domain=istio-system.svc.cluster.local
2023-12-07T16:37:58.809930Z     info    Apply mesh config from file accessLogEncoding: JSON
accessLogFile: /dev/stdout
defaultConfig:
  discoveryAddress: istiod.istio-system.svc:15012
  gatewayTopology:
    proxyProtocol: {}
  proxyMetadata: {}
  tracing:
    zipkin:
      address: zipkin.istio-system:9411
defaultProviders:
  metrics:
  - prometheus
enablePrometheusMerge: true
rootNamespace: istio-system
trustDomain: cluster.local
2023-12-07T16:37:58.812539Z     info    cpu limit detected as 2, setting concurrency
2023-12-07T16:37:58.813227Z     info    Effective config: binaryPath: /usr/local/bin/envoy
concurrency: 2
configPath: ./etc/istio/proxy
controlPlaneAuthPolicy: MUTUAL_TLS
discoveryAddress: istiod.istio-system.svc:15012
drainDuration: 45s
gatewayTopology:
  proxyProtocol: {}
proxyAdminPort: 15000
serviceCluster: istio-proxy
statNameLength: 189
statusPort: 15020
terminationDrainDuration: 5s
tracing:
  zipkin:
    address: zipkin.istio-system:9411

2023-12-07T16:37:58.813249Z     info    JWT policy is third-party-jwt
2023-12-07T16:37:58.813254Z     info    using credential fetcher of JWT type in cluster.local trust domain
2023-12-07T16:37:58.816122Z     info    platform detected is AWS
2023-12-07T16:37:58.819499Z     info    Opening status port 15020
2023-12-07T16:37:58.819513Z     info    Workload SDS socket not found. Starting Istio SDS Server
2023-12-07T16:37:58.820112Z     info    CA Endpoint istiod.istio-system.svc:15012, provider Citadel
2023-12-07T16:37:58.820279Z     info    Using CA istiod.istio-system.svc:15012 cert with certs: var/run/secrets/istio/root-cert.pem
2023-12-07T16:37:58.851939Z     info    ads     All caches have been synced up in 46.862493ms, marking server ready
2023-12-07T16:37:58.854338Z     info    xdsproxy        Initializing with upstream address "istiod.istio-system.svc:15012" and cluster "Kubernetes"
2023-12-07T16:37:58.863897Z     info    sds     Starting SDS grpc server
2023-12-07T16:37:58.865043Z     info    starting Http service at 127.0.0.1:15004
2023-12-07T16:37:58.866108Z     info    Pilot SAN: [istiod.istio-system.svc]
2023-12-07T16:37:58.872258Z     info    Starting proxy agent
2023-12-07T16:37:58.872437Z     info    starting
2023-12-07T16:37:58.872529Z     info    Envoy command: [-c etc/istio/proxy/envoy-rev.json --drain-time-s 45 --drain-strategy immediate --local-address-ip-version v4 --file-flush-interval-msec 1000 --disable-hot-restart --allow-unknown-static-fields --log-format %Y-%m-%dT%T.%fZ   %l      envoy %n %g:%#  %v      thread=%t -l warning --component-log-level misc:error --concurrency 2]
2023-12-07T16:37:59.028579Z     info    xdsproxy        connected to upstream XDS server: istiod.istio-system.svc:15012
2023-12-07T16:37:59.114412Z     warn    ca      ca request failed, starting attempt 1 in 104.293435ms
2023-12-07T16:37:59.220208Z     warn    ca      ca request failed, starting attempt 2 in 210.453508ms
2023-12-07T16:37:59.436033Z     warn    ca      ca request failed, starting attempt 3 in 436.782402ms
2023-12-07T16:37:59.873742Z     warn    ca      ca request failed, starting attempt 4 in 858.466147ms
2023-12-07T16:38:00.732809Z     warn    sds     failed to warm certificate: failed to generate workload certificate: create certificate: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 172.20.42.15:15012: connect: connection refused"
2023-12-07T16:38:01.333621Z     warn    ca      ca request failed, starting attempt 1 in 100.275377ms
2023-12-07T16:38:01.435077Z     warn    ca      ca request failed, starting attempt 2 in 217.556266ms
2023-12-07T16:38:01.653798Z     warn    ca      ca request failed, starting attempt 3 in 419.160448ms
2023-12-07T16:38:02.073252Z     warn    ca      ca request failed, starting attempt 4 in 848.304805ms
2023-12-07T16:38:02.921902Z     warn    sds     failed to warm certificate: failed to generate workload certificate: create certificate: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 172.20.42.15:15012: connect: connection refused"
2023-12-07T16:38:03.572559Z     warn    ca      ca request failed, starting attempt 1 in 96.372124ms
2023-12-07T16:38:03.670406Z     warn    ca      ca request failed, starting attempt 2 in 194.402948ms
2023-12-07T16:38:03.865241Z     warn    ca      ca request failed, starting attempt 3 in 375.20709ms
2023-12-07T16:38:04.240850Z     warn    ca      ca request failed, starting attempt 4 in 721.763932ms
2023-12-07T16:38:04.972798Z     info    cache   generated new workload certificate      latency=1.673538997s ttl=23h59m59.027207436s
2023-12-07T16:38:04.972846Z     info    cache   Root cert has changed, start rotating root cert
2023-12-07T16:38:04.972871Z     info    ads     XDS: Incremental Pushing ConnectedEndpoints:0 Version:
2023-12-07T16:38:04.972934Z     info    cache   returned workload trust anchor from cache       ttl=23h59m59.027067566s
2023-12-07T16:38:19.767075Z     info    xdsproxy        connected to upstream XDS server: istiod.istio-system.svc:15012
2023-12-07T16:38:19.795211Z     info    ads     ADS: new connection for node:istio-ingressgateway-8fccfb4b9-f5t9q.istio-system-1
2023-12-07T16:38:19.795455Z     info    cache   returned workload certificate from cache        ttl=23h59m44.204548721s
2023-12-07T16:38:19.795844Z     info    ads     SDS: PUSH request for node:istio-ingressgateway-8fccfb4b9-f5t9q.istio-system resources:1 size:4.0kB resource:default
2023-12-07T16:38:19.797657Z     info    ads     ADS: new connection for node:istio-ingressgateway-8fccfb4b9-f5t9q.istio-system-2
2023-12-07T16:38:19.797798Z     info    cache   returned workload trust anchor from cache       ttl=23h59m44.20220605s
2023-12-07T16:38:19.798047Z     info    ads     SDS: PUSH request for node:istio-ingressgateway-8fccfb4b9-f5t9q.istio-system resources:1 size:1.1kB resource:ROOTCA
2023-12-07T16:38:19.821752Z     warning envoy config external/envoy/source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:138    gRPC config for type.googleapis.com/envoy.config.listener.v3.Listener rejected: Error adding/updating listener(s) 0.0.0.0_8080: Didn't find a registered implementation for 'envoy.filters.listener.proxy_protocol' with type URL: ''
0.0.0.0_8081: Didn't find a registered implementation for 'envoy.filters.listener.proxy_protocol' with type URL: ''
        thread=16
2023-12-07T16:38:20.204151Z     warn    Envoy proxy is NOT ready: config received from XDS server, but was rejected: cds updates: 2 successful, 0 rejected; lds updates: 0 successful, 1 rejected
2023-12-07T16:38:22.203660Z     warn    Envoy proxy is NOT ready: config received from XDS server, but was rejected: cds updates: 2 successful, 0 rejected; lds updates: 0 successful, 1 rejected
2023-12-07T16:38:24.204390Z     warn    Envoy proxy is NOT ready: config received from XDS server, but was rejected: cds updates: 2 successful, 0 rejected; lds updates: 0 successful, 1 rejected
2023-12-07T16:38:26.204072Z     warn    Envoy proxy is NOT ready: config received from XDS server, but was rejected: cds updates: 2 successful, 0 rejected; lds updates: 0 successful, 1 rejected
2023-12-07T16:38:28.203571Z     warn    Envoy proxy is NOT ready: config received from XDS server, but was rejected: cds updates: 2 successful, 0 rejected; lds updates: 0 successful, 1 rejected

So, my question is, what am I doing wrong in order to enable Proxy Protocol and be able to read the requester IP address?

Thanks for your help in advance.

skonto commented 9 months ago

Hi @jdiazper, not sure about the rest of the setup but I suspect the configuration error you are seeing:

Didn't find a registered implementation for 'envoy.filters.listener.proxy_protocol' with type URL: '' thread=16

is related to the outdated envoy filter syntax. Check a correct Envoy sample here. Instead of just:

    - name: envoy.filters.listener.proxy_protocol

you need:

    - name: envoy.filters.listener.proxy_protocol
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.filters.listener.proxy_protocol.v3.ProxyProtocol

See a similar issue here. Would you like to try that?

jdiazper commented 9 months ago

Hi @skonto , this indeed has fixed the filter error. Thanks!

However, I am still receiving upstream connect error or disconnect/reset before headers. reset reason: connection termination

For what I can see from the logs, the request reaches the istio gateway pod.

kubectl -n istio-system logs -f istio-ingressgateway-8fccfb4b9-rfrmb
{"connection_termination_details":null,"bytes_sent":95,"response_code_details":"upstream_reset_before_response_started{connection_termination}","bytes_received":0,"upstream_transport_failure_reason":null,"route_name":null,"requested_server_name":null,"upstream_host":"10.99.105.178:8081","upstream_service_time":null,"user_agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36","upstream_cluster":"outbound|80||knative-local-gateway.istio-system.svc.cluster.local","response_code":503,"authority":"backend.test-service.svc.cluster.local","start_time":"2023-12-12T09:53:02.692Z","downstream_local_address":"10.0.49.117:80","x_forwarded_for":"XXX.XXX.XXX.XXX","downstream_remote_address":"XXX.XXX.XXX.XXX:45822","method":"GET","response_flags":"UC","duration":0,"upstream_local_address":"10.99.105.178:50034","request_id":"6651db19-24a5-4921-99a9-2a0beae96fa7","path":"/","protocol":"HTTP/1.1"}

However when I query the knative pod istio-proxy there is nothing from the request above.

kubectl -n test-service logs -c istio-proxy backend-00001-deployment-67f9d98749-7nj44 | grep "XXX.XXX.XXX.XXX"
skonto commented 9 months ago

Hey could it be that you have set: networking.knative.dev/visibility: "cluster-local" ?

jdiazper commented 9 months ago

Hi @skonto ,

I have modified the service file to look like below

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: backend
  namespace: test-service
  labels:
    networking.knative.dev/disable-auto-tls: "true"
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/initial-scale: "1"
        autoscaling.knative.dev/min-scale: "1"
        autoscaling.knative.dev/max-scale: "10"
    spec:
      containers:
        - image: fake-registry.github.com/fakepath/backend:1.0.0
          env:
            - name: ENVIRONMENT
              value: test

But unfortunatelly I am still receiving the same upstream connect error or disconnect/reset before headers. reset reason: connection termination and request are not reaching the service.

As an additional note, the Kubernetes cluster is on AWS EKS 1.28.

skonto commented 9 months ago

Hi @jdiazper, before even adding the envoy filter which will block any request anyway unless it is a proxy based request (unless you set "allow_requests_without_proxy_protocol": true), let's try fix the Knative setup.

Btw I am not sure if the envoy filter makes sense as it it meant only for tcp traffic. What should happen from what I read is that:

The client IP is retrieved from the PROXY protocol by the gateway and set (or appended) in the X-Forwarded-For and X-Envoy-External-Address header.

Not sure what is successful with the httpbin example above.

Let's do the following to test/fix the Knative setup:

a) Let's avoid using the virtual service.

b) set a sample domain:

kubectl patch configmap/config-domain   --namespace knative-serving   --type merge   --patch '{"data":{"example.com":""}}'

c) use curl -H "Host: backend.test-service.example.com" http://a8ace71c642ba49b39f0f0bc25d9f37b-1902910119.eu-central-1.elb.amazonaws.com

You can check a similar guide here, unfortunately I can't test on EKS at the moment so let's try make Knative work first on the cluster.

jdiazper commented 9 months ago

Hi @skonto ,

Thanks for your help!!

Following your advice I can see how the service is successfully called and the headers contains the client ip (see logs below).

Sample logs from backend service

[2023-12-14 16:30:55 +0000] [7] [DEBUG] GET /
[2023-12-14 16:30:55 +0000] [7] [DEBUG] Request Headers Host: backend.test-service.example.com
User-Agent: curl/7.81.0
Accept: */*
Forwarded: for=XXX.XXX.XXX.XXX;proto=http, for=10.99.134.226
K-Proxy-Request: activator
X-B3-Parentspanid: 4197d0dd9c083110
X-B3-Sampled: 0
X-B3-Spanid: 6db83aed2f1e8d88
X-B3-Traceid: 9bf42975efafa1514197d0dd9c083110
X-Envoy-Attempt-Count: 1
X-Envoy-External-Address: XXX.XXX.XXX.XXX
X-Forwarded-For: XXX.XXX.XXX.XXX, 10.99.134.226, 127.0.0.6
X-Forwarded-Proto: http
X-Request-Id: d1def3cf-339f-4a9e-a4d2-537361c3aeee

While, this is great, all my endpoints are currently using Virtual Service so I can direct the request from https://mydomain.com/ based on the path.

e.g.

https://mydomain.com/service1/hello -> call service1
https://mydomain.com/service2/hello -> call service2
etc...

However if I use the config-domain all my requests should be changed to

https://service1.namespace.mydomain.com/
https://service2.namespace.mydomain.com/
etc...

Is there any way I can continue using the Virtual Service? or an alternative to avoid having to change all the existing endpoints?

Thanks again for your help!

skonto commented 9 months ago

I think what you are looking for is a domain mapping.

Here is an example. First enable: autocreate-cluster-domain-claims: "true" in config-network cm in knative-serving ns.

Then apply the dm with a custom domain hello.home:

apiVersion: serving.knative.dev/v1beta1
kind: DomainMapping
metadata:
  name: hello.home
spec:
  ref:
    name: helloworld-go
    kind: Service
    apiVersion: serving.knative.dev/v1

Then I can do (this is minikube so probably different than your setup and so I use a host header):

$ kubectl get ksvc -n test
NAME            URL                                     LATESTCREATED         LATESTREADY           READY   REASON
helloworld-go   http://helloworld-go.test.example.com   helloworld-go-00001   helloworld-go-00001   True

$ kubectl get po -n test
NAME                                              READY   STATUS    RESTARTS   AGE
helloworld-go-00001-deployment-76f4bd5c4b-fjm7j   2/3     Running   0          4s

$ curl -H "Host: hello.home" http://192.168.39.125:32385
Hello Go Sample v1!

$ curl -H "Host: helloworld-go.test.example.com" http://192.168.39.125:32385
Hello Go Sample v1!

The virtual services you get:

$ kubectl get virtualservices.networking.istio.io   -n test
NAME                    GATEWAYS                                                                              HOSTS                                                                                                                     AGE
hello.home-ingress      ["knative-serving/knative-ingress-gateway"]                                           ["hello.home"]                                                                                                            6m35s
helloworld-go-ingress   ["knative-serving/knative-ingress-gateway","knative-serving/knative-local-gateway"]   ["helloworld-go.test","helloworld-go.test.example.com","helloworld-go.test.svc","helloworld-go.test.svc.cluster.local"]   8m28s
helloworld-go-mesh      ["mesh"]                                                                              ["helloworld-go.test","helloworld-go.test.svc","helloworld-go.test.svc.cluster.local"]                                    8m28s

Domain mapping is essentially using a rewrite host rule and targets the related gateway. In detail:

$ kubectl describe virtualservices.networking.istio.io hello.home-ingress    -n test 
Name:         hello.home-ingress
Namespace:    test
Labels:       networking.internal.knative.dev/ingress=hello.home
Annotations:  networking.knative.dev/ingress.class: istio.ingress.networking.knative.dev
              serving.knative.dev/creator: minikube-user
              serving.knative.dev/lastModifier: minikube-user
API Version:  networking.istio.io/v1beta1
Kind:         VirtualService
Metadata:
  Creation Timestamp:  2023-12-20T11:35:21Z
  Generation:          1
....
Spec:
  Gateways:
    knative-serving/knative-ingress-gateway
  Hosts:
    hello.home
  Http:
    Headers:
      Request:
        Set:
          K - Network - Hash:  e1c8b6eafa3b8d150f7596327b60280db748bd900e97a4c07791299719425f8d
    Match:
      Authority:
        Prefix:  hello.home
      Gateways:
        knative-serving/knative-ingress-gateway
      Headers:
        K - Network - Hash:
          Exact:  override
    Retries:
    Rewrite:
      Authority:  helloworld-go.test.svc.cluster.local
    Route:
      Destination:
        Host:  helloworld-go.test.svc.cluster.local
        Port:
          Number:  80
      Headers:
        Request:
          Set:
            K - Original - Host:  hello.home
      Weight:                     100
    Match:
      Authority:
        Prefix:  hello.home
      Gateways:
        knative-serving/knative-ingress-gateway
    Retries:
    Rewrite:
      Authority:  helloworld-go.test.svc.cluster.local
    Route:
      Destination:
        Host:  helloworld-go.test.svc.cluster.local
        Port:
          Number:  80
      Headers:
        Request:
          Set:
            K - Original - Host:  hello.home
      Weight:                     100
Events:                           <none>
jdiazper commented 9 months ago

Hi @skonto ,

That's interesting...

I have done the suggested changes above and when I use the domain mapping and I curl hello.home I am getting the same upstream connect error or disconnect/reset before headers. reset reason: connection termination error as if I do it using the virtual service.

What I have done is:

I have edited the configmap (kubectl -n knative-serving edit cm config-network) and added autocreate-cluster-domain-claims: "true" after _example.

kubectl -n knative-serving describe cm config-network

Output:

Name:         config-network
Namespace:    knative-serving
Labels:       app.kubernetes.io/component=networking
              app.kubernetes.io/name=knative-serving
              app.kubernetes.io/version=1.12.2
Annotations:  knative.dev/example-checksum: 0573e07d
              manifestival: new

Data
====
_example:
----
################################
#                              #
#    EXAMPLE CONFIGURATION     #
#                              #
################################

# This block is not actually functional configuration,
# but serves to illustrate the available configuration
# options and document them in a way that is accessible
# to users that `kubectl edit` this config map.
#
# These sample configuration options may be copied out of
# this example block and unindented to be in the data block
# to actually change the configuration.

# ingress-class specifies the default ingress class
# to use when not dictated by Route annotation.
#
# If not specified, will use the Istio ingress.
#
# Note that changing the Ingress class of an existing Route
# will result in undefined behavior.  Therefore it is best to only
# update this value during the setup of Knative, to avoid getting
# undefined behavior.
ingress-class: "istio.ingress.networking.knative.dev"

# certificate-class specifies the default Certificate class
# to use when not dictated by Route annotation.
#
# If not specified, will use the Cert-Manager Certificate.
#
# Note that changing the Certificate class of an existing Route
# will result in undefined behavior.  Therefore it is best to only
# update this value during the setup of Knative, to avoid getting
# undefined behavior.
certificate-class: "cert-manager.certificate.networking.knative.dev"

# namespace-wildcard-cert-selector specifies a LabelSelector which
# determines which namespaces should have a wildcard certificate
# provisioned.
#
# Use an empty value to disable the feature (this is the default):
#   namespace-wildcard-cert-selector: ""
#
# Use an empty object to enable for all namespaces
#   namespace-wildcard-cert-selector: {}
#
# Useful labels include the "kubernetes.io/metadata.name" label to
# avoid provisioning a certificate for the "kube-system" namespaces.
# Use the following selector to match pre-1.0 behavior of using
# "networking.knative.dev/disableWildcardCert" to exclude namespaces:
#
# matchExpressions:
# - key: "networking.knative.dev/disableWildcardCert"
#   operator: "NotIn"
#   values: ["true"]
namespace-wildcard-cert-selector: ""

# domain-template specifies the golang text template string to use
# when constructing the Knative service's DNS name. The default
# value is "{{.Name}}.{{.Namespace}}.{{.Domain}}".
#
# Valid variables defined in the template include Name, Namespace, Domain,
# Labels, and Annotations. Name will be the result of the tag-template
# below, if a tag is specified for the route.
#
# Changing this value might be necessary when the extra levels in
# the domain name generated is problematic for wildcard certificates
# that only support a single level of domain name added to the
# certificate's domain. In those cases you might consider using a value
# of "{{.Name}}-{{.Namespace}}.{{.Domain}}", or removing the Namespace
# entirely from the template. When choosing a new value be thoughtful
# of the potential for conflicts - for example, when users choose to use
# characters such as `-` in their service, or namespace, names.
# {{.Annotations}} or {{.Labels}} can be used for any customization in the
# go template if needed.
# We strongly recommend keeping namespace part of the template to avoid
# domain name clashes:
# eg. '{{.Name}}-{{.Namespace}}.{{ index .Annotations "sub"}}.{{.Domain}}'
# and you have an annotation {"sub":"foo"}, then the generated template
# would be {Name}-{Namespace}.foo.{Domain}
domain-template: "{{.Name}}.{{.Namespace}}.{{.Domain}}"

# tag-template specifies the golang text template string to use
# when constructing the DNS name for "tags" within the traffic blocks
# of Routes and Configuration.  This is used in conjunction with the
# domain-template above to determine the full URL for the tag.
tag-template: "{{.Tag}}-{{.Name}}"

# auto-tls is deprecated and replaced by external-domain-tls
auto-tls: "Disabled"

# Controls whether TLS certificates are automatically provisioned and
# installed in the Knative ingress to terminate TLS connections
# for cluster external domains (like: app.example.com)
# - Enabled: enables the TLS certificate provisioning feature for cluster external domains.
# - Disabled: disables the TLS certificate provisioning feature for cluster external domains.
external-domain-tls: "Disabled"

# Controls weather TLS certificates are automatically provisioned and
# installed in the Knative ingress to terminate TLS connections
# for cluster local domains (like: app.namespace.svc.<your-cluster-domain>)
# - Enabled: enables the TLS certificate provisioning feature for cluster cluster-local domains.
# - Disabled: disables the TLS certificate provisioning feature for cluster cluster local domains.
# NOTE: This flag is in an alpha state and is mostly here to enable internal testing
#       for now. Use with caution.
cluster-local-domain-tls: "Disabled"

# internal-encryption is deprecated and replaced by system-internal-tls
internal-encryption: "false"

# system-internal-tls controls weather TLS encryption is used for connections between
# the internal components of Knative:
# - ingress to activator
# - ingress to queue-proxy
# - activator to queue-proxy
#
# Possible values for this flag are:
# - Enabled: enables the TLS certificate provisioning feature for cluster cluster-local domains.
# - Disabled: disables the TLS certificate provisioning feature for cluster cluster local domains.
# NOTE: This flag is in an alpha state and is mostly here to enable internal testing
#       for now. Use with caution.
system-internal-tls: "Disabled"

# Controls the behavior of the HTTP endpoint for the Knative ingress.
# It requires auto-tls to be enabled.
# - Enabled: The Knative ingress will be able to serve HTTP connection.
# - Redirected: The Knative ingress will send a 301 redirect for all
# http connections, asking the clients to use HTTPS.
#
# "Disabled" option is deprecated.
http-protocol: "Enabled"

# rollout-duration contains the minimal duration in seconds over which the
# Configuration traffic targets are rolled out to the newest revision.
rollout-duration: "0"

# autocreate-cluster-domain-claims controls whether ClusterDomainClaims should
# be automatically created (and deleted) as needed when DomainMappings are
# reconciled.
#
# If this is "false" (the default), the cluster administrator is
# responsible for creating ClusterDomainClaims and delegating them to
# namespaces via their spec.Namespace field. This setting should be used in
# multitenant environments which need to control which namespace can use a
# particular domain name in a domain mapping.
#
# If this is "true", users are able to associate arbitrary names with their
# services via the DomainMapping feature.
autocreate-cluster-domain-claims: "false"

# If true, networking plugins can add additional information to deployed
# applications to make their pods directly accessible via their IPs even if mesh is
# enabled and thus direct-addressability is usually not possible.
# Consumers like Knative Serving can use this setting to adjust their behavior
# accordingly, i.e. to drop fallback solutions for non-pod-addressable systems.
#
# NOTE: This flag is in an alpha state and is mostly here to enable internal testing
#       for now. Use with caution.
enable-mesh-pod-addressability: "false"

# mesh-compatibility-mode indicates whether consumers of network plugins
# should directly contact Pod IPs (most efficient), or should use the
# Cluster IP (less efficient, needed when mesh is enabled unless
# `enable-mesh-pod-addressability`, above, is set).
# Permitted values are:
#  - "auto" (default): automatically determine which mesh mode to use by trying Pod IP and falling back to Cluster IP as needed.
#  - "enabled": always use Cluster IP and do not attempt to use Pod IPs.
#  - "disabled": always use Pod IPs and do not fall back to Cluster IP on failure.
mesh-compatibility-mode: "auto"

# Defines the scheme used for external URLs if auto-tls is not enabled.
# This can be used for making Knative report all URLs as "HTTPS" for example, if you're
# fronting Knative with an external loadbalancer that deals with TLS termination and
# Knative doesn't know about that otherwise.
default-external-scheme: "http"

autocreate-cluster-domain-claims:
----
true

BinaryData
====

Events:  <none>

I restarted istio-system pods

kubectl -n istio-system delete pods --all

apply the domain mapping

domain-mapping.yaml

apiVersion: serving.knative.dev/v1beta1
kind: DomainMapping
metadata:
  name: hello.home
spec:
  ref:
    name: backend
    kind: Service
    apiVersion: serving.knative.dev/v1
kubectl -n test-service apply -f domain-mapping.yaml

output

domainmapping.serving.knative.dev/hello.home created

Run commands:

kubectl get ksvc -n test-service

output:

NAME      URL                                       LATESTCREATED   LATESTREADY     READY   REASON
backend   http://backend.test-service.example.com   backend-00001   backend-00001   True    
kubectl get po -n test-service

output:

NAME                                                READY     STATUS    RESTARTS   AGE
backend-00001-deployment-84b7f96856-cxc2f   3/3     Running   0         51m
curl -H "Host: hello.home" http://a8ace71c642ba49b39f0f0bc25d9f37b-1902910119.eu-central-1.elb.amazonaws.com

output:

upstream connect error or disconnect/reset before headers. reset reason: connection termination

However if I run:

curl -H "Host: backend.test-service.example.com" http://a8ace71c642ba49b39f0f0bc25d9f37b-1902910119.eu-central-1.elb.amazonaws.com -v

output

*   Trying 3.120.24.218:80...
* Connected to a8ace71c642ba49b39f0f0bc25d9f37b-1902910119.eu-central-1.elb.amazonaws.com (3.120.24.218) port 80 (#0)
> GET / HTTP/1.1
> Host: backend.test-service.example.com
> User-Agent: curl/7.81.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< content-length: 0
< content-type: text/html; charset=utf-8
< date: Fri, 22 Dec 2023 15:54:14 GMT
< server: istio-envoy
< x-envoy-upstream-service-time: 9
< 
* Connection #0 to host a8ace71c642ba49b39f0f0bc25d9f37b-1902910119.eu-central-1.elb.amazonaws.com left intact

The list of virtual service I get are:

kubectl get virtualservices.networking.istio.io -n test-service

output

NAME                          GATEWAYS                                                                              HOSTS                                                                                                                                                             AGE
hello.home-ingress            ["knative-serving/knative-ingress-gateway"]                                           ["hello.home"]                                                                                                                                                    73s
backend-ingress               ["knative-serving/knative-ingress-gateway","knative-serving/knative-local-gateway"]   ["backend.test-service","backend.test-service.example.com","backend.test-service.svc","backend.test-service.svc.cluster.local"]   44m
backend-mesh                  ["mesh"]                                                                              ["backend.test-service","backend.test-service.svc","backend.test-service.svc.cluster.local"]                                      44m

virutal service

kubectl describe virtualservices.networking.istio.io hello.home-ingress -n test-service

output

Name:         hello.home-ingress
Namespace:    test-service
Labels:       networking.internal.knative.dev/ingress=hello.home
Annotations:  networking.knative.dev/ingress.class: istio.ingress.networking.knative.dev
              serving.knative.dev/creator: kubernetes-admin
              serving.knative.dev/lastModifier: kubernetes-admin
API Version:  networking.istio.io/v1beta1
Kind:         VirtualService
Metadata:
  Creation Timestamp:  2023-12-22T15:07:22Z
  Generation:          1
  Managed Fields:
    API Version:  networking.istio.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:networking.knative.dev/ingress.class:
          f:serving.knative.dev/creator:
          f:serving.knative.dev/lastModifier:
        f:labels:
          .:
          f:networking.internal.knative.dev/ingress:
        f:ownerReferences:
          .:
          k:{"uid":"d16410fb-0f8c-4156-b089-c36a496cad63"}:
      f:spec:
        .:
        f:gateways:
        f:hosts:
        f:http:
    Manager:    controller
    Operation:  Update
    Time:       2023-12-22T15:07:22Z
  Owner References:
    API Version:           networking.internal.knative.dev/v1alpha1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Ingress
    Name:                  hello.home
    UID:                   d16410fb-0f8c-4156-b089-c36a496cad63
  Resource Version:        43307
  UID:                     f86ce515-84e9-45c5-8289-20b570f2430c
Spec:
  Gateways:
    knative-serving/knative-ingress-gateway
  Hosts:
    hello.home
  Http:
    Headers:
      Request:
        Set:
          K - Network - Hash:  5ab79157aea7d6972dc18ac6a3eaae7e303fba9072348a66f6accffb8afc66ce
    Match:
      Authority:
        Prefix:  hello.home
      Gateways:
        knative-serving/knative-ingress-gateway
      Headers:
        K - Network - Hash:
          Exact:  override
    Retries:
    Rewrite:
      Authority:  backend.test-service.svc.cluster.local
    Route:
      Destination:
        Host:  backend.test-service.svc.cluster.local
        Port:
          Number:  80
      Headers:
        Request:
          Set:
            K - Original - Host:  hello.home
      Weight:                     100
    Match:
      Authority:
        Prefix:  hello.home
      Gateways:
        knative-serving/knative-ingress-gateway
    Retries:
    Rewrite:
      Authority:  backend.test-service.svc.cluster.local
    Route:
      Destination:
        Host:  backend.test-service.svc.cluster.local
        Port:
          Number:  80
      Headers:
        Request:
          Set:
            K - Original - Host:  hello.home
      Weight:                     100
Events:                           <none>

What am I doing wrong?!

jdiazper commented 8 months ago

Hi @skonto ,

Sorry to bother you! Did you have any chance to look at this?

Thanks for your help in advance!!

jdiazper commented 8 months ago

Hi @skonto,

After repeating the process multiple times, I wonder if this could be a bug? If so, could you please change the label accordingly?

Thanks again for your help and support.

skonto commented 8 months ago

Hi @jdiazper I have some cycles this week let me take a look first. I will get back to it here.

skonto commented 8 months ago

I tried this on OCP on AWS by exposing the ingress gateway on the AWS lb:

$ curl  -k -H "Host: hello.home"  https://a7b958e51e73240569946d2dce5c436f-1398313270.us-east-1.elb.amazonaws.com
Hello Go Sample v1!

$ curl  -k -H "Host: helloworld-go-kserve-demo.apps.ci-ln-8y4h6vk-76ef8.origin-ci-int-aws.dev.rhcloud.com"  https://a7b958e51e73240569946d2dce5c436f-1398313270.us-east-1.elb.amazonaws.com
Hello Go Sample v1!

I am using https but that should not affect the header stuff:

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: knative-local-gateway
  namespace: knative-serving
spec:
  selector:
    istio: ingressgateway
  servers:
    - hosts:
        - '*'
      port:
        name: http
        number: 8081
        protocol: HTTP

Could you pls confirm that you have istio enabled at the Serving ns. In my setup I have something similar to this (sidecars are actually injected to activator for example):

NAME                                     READY   STATUS    RESTARTS   AGE
activator-79bb9b57b8-hbwtx               2/2     Running   0          37m
activator-79bb9b57b8-kvzmw               2/2     Running   0          38m
autoscaler-795fcb857b-5g9jm              2/2     Running   0          38m
autoscaler-795fcb857b-tw4h4              2/2     Running   0          38m
autoscaler-hpa-6f8d5766b9-2zx8p          1/1     Running   0          38m
autoscaler-hpa-6f8d5766b9-zmzv9          1/1     Running   0          38m
controller-7c66679589-2kvtj              1/1     Running   0          37m
controller-7c66679589-h6q7g              1/1     Running   0          37m
domain-mapping-74b6bf4bd5-scwqd          1/1     Running   0          38m
domain-mapping-74b6bf4bd5-xrbqw          1/1     Running   0          38m
domainmapping-webhook-6c5b5496b9-425w5   1/1     Running   0          38m
domainmapping-webhook-6c5b5496b9-8fjrv   1/1     Running   0          38m
net-istio-controller-56cf8f5bd9-lz47f    1/1     Running   0          37m
net-istio-controller-56cf8f5bd9-v8nw2    1/1     Running   0          37m
net-istio-webhook-5665bb679f-b6cbw       1/1     Running   0          37m
net-istio-webhook-5665bb679f-c2zt4       1/1     Running   0          37m
webhook-6477c746f9-25b2n                 1/1     Running   0          38m
webhook-6477c746f9-8c2fd                 1/1     Running   0          37m

Could you also print the status of the dm:

NAME         URL                  READY   REASON
hello.home   https://hello.home   True    

Here are some output about resources created, hope that helps to debug:

virtualsvcs.json ingresses.json dms.json

jdiazper commented 8 months ago

Hi @skonto ,

Please find below the requested information:

I can see that sidecar is in most of the pods.

NAME                                    READY   STATUS    RESTARTS   AGE
activator-b7df55675-2z4gg               2/2     Running   0          32m
autoscaler-64655fb9c-s8scd              2/2     Running   0          32m
autoscaler-hpa-5cfbbbf988-c54bm         2/2     Running   0          32m
controller-7b854b6dfb-5zxnl             2/2     Running   0          32m
net-istio-controller-586c554d76-bg8lr   1/1     Running   0          32m
net-istio-webhook-6bcff4d984-xj9k9      2/2     Running   0          32m
webhook-6977cb78d8-bw2br                2/2     Running   0          32m

Status of dm:

NAME         URL                 READY   REASON
hello.home   http://hello.home   True  

I've also comapred files and the main difference (other than expected namespaces and service names) is that in your ingress file you have "serving.knative.openshift.io/enablePassthrough": "true", while in mine that doesn't exists (I don't user OCP), so I guess that makes sense?

When you install istio...

Thank you!

skonto commented 8 months ago

Do you enable the proxy protocol?

I will double check and I will report back here.

Do you install istio via istioctl?

No downsteam I used Openshift Service Mesh.

Could you please describe the steps & order of your knative/istio installation/configuration?

Let me first reproduce and then we can port to Istio as I use Openshift SM.

skonto commented 8 months ago

I played a bit with envoy filter so I ended up using something like:

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: proxy-protocol
spec:
  workloadSelector:
    labels:
      app: helloworld-go
  configPatches:
  - applyTo: LISTENER
    patch:
      operation: MERGE
      value:
        listener_filters:
        - name: proxy_protocol
          typed_config:
            "@type": "type.googleapis.com/envoy.extensions.filters.listener.proxy_protocol.v3.ProxyProtocol"
            allow_requests_without_proxy_protocol: false
        - name: tls_inspector
          typed_config:
            "@type": "type.googleapis.com/envoy.extensions.filters.listener.tls_inspector.v3.TlsInspector"

This has to be applied with a selector for every sidecar on the path including the ingressgateway, activator. So you need separate filters for example with some proper selectors eg ingressgateway:

  workloadSelector:
    labels:
      "istio": "ingressgateway"

Note that I could not apply the latest syntax as in here https://istio.io/latest/docs/reference/config/networking/envoy-filter/ (there is an example there) as in my env LISTENER_FILTER is not supported (would have been more convenient I think).

When setting https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/filters/listener/proxy_protocol/v3/proxy_protocol.proto.html#extensions-filters-listener-proxy-protocol-v3-proxyprotocol to true requests did pass:

curl -k -H "Host: helloworld-go....com"  --haproxy-protocol  https://aef14d3fa841843339ba78a333aba001-1473205865...com:443
Hello Go Sample v1!

With the default false value requests don't pass (also activator needs to have this true to be healthy). In any case if reqs only pass when flag is true then it means that the requests sent are not proxy protocol based. However I see that you need to enable proxy protocol at the AWS lb side: https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/enable-proxy-protocol.html#proxy-protocol. Regarding the ingress annotation there is an extra needed: service.beta.kubernetes.io/aws-load-balancer-type: "nlb" (https://istio.io/v1.15/blog/2020/show-source-ip)? The blog post is old so needs verififcation.

So I suspect you need to make sure each hop has proxy protocol enabled. Could you please make sure you have the above verified a) AWS LB setup, b) annotation if Istio supports it as expected and report back here (I am not able to do the above in my env)? For the latter I would first check if AWS LB + ISTIO + simple app works as expected with proxy protocol enabled on EKS. Could you do that showing that client ip/host is printed with a regular K8s deployment using Istio (no Knative), as it is not clear to me from your description earlier?

jdiazper commented 8 months ago

Hi @skonto,

Answering your request about "I would first check if AWS LB + ISTIO + simple app works as expected with proxy protocol enabled on EKS. Could you do that showing that client ip/host is printed with a regular K8s deployment using Istio (no Knative), as it is not clear to me from your description earlier?"

I have followed this guide to deploy a simple service and enable proxy protocol.

As a summary:

  1. I have installed Istio using istioctl
    istioctl install

    output:

    This will install the Istio 1.20.0 "default" profile (with components: Istio core, Istiod, and Ingress gateways) into the cluster. Proceed? (y/N) y
    ✔ Istio core installed                                                                                                                                                    
    ✔ Istiod installed                                                                                                                                                        
    ✔ Ingress gateways installed                                                                                                                                              
    ✔ Installation complete                                                                                                                                                   Made this installation the default for injection and validation.
  2. Then I deployed a workload, httpbin, in namespace foo with sidecar injection enabled and exposed httpbin through an ingress gateway.
  3. Then I called the service.
curl http://a681b3f3f054e4cddbc814a9d8477d38-1939553369.eu-central-1.elb.amazonaws.com/headers

output

{
  "headers": {
    "Accept": "*/*", 
    "Host": "a681b3f3f054e4cddbc814a9d8477d38-1939553369.eu-central-1.elb.amazonaws.com", 
    "User-Agent": "curl/7.81.0", 
    "X-B3-Parentspanid": "be4209649da0b1fd", 
    "X-B3-Sampled": "0", 
    "X-B3-Spanid": "b8d396f508cbcdf7", 
    "X-B3-Traceid": "8c18d72ddca54aecbe4209649da0b1fd", 
    "X-Envoy-Attempt-Count": "1", 
    "X-Envoy-Internal": "true", 
    "X-Forwarded-Client-Cert": "By=spiffe://cluster.local/ns/foo/sa/httpbin;Hash=b0949ca3855a3d4b02ca6ccae79594d70e4a063b42d453d4bcfaa879f8a42d13;Subject=\"\";URI=spiffe://cluster.local/ns/istio-system/sa/istio-ingressgateway-service-account"
  }
}

As you can see at this stage the source ip is not available.

  1. Then, as I use AWS Classic Load Balancer, I have used TCP Proxy configuration.

enable-proxy-protocol.yaml

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  meshConfig:
    accessLogEncoding: JSON
    accessLogFile: /dev/stdout
    defaultConfig:
      gatewayTopology:
        proxyProtocol: {}
  components:
    ingressGateways:
    - enabled: true
      name: istio-ingressgateway
      k8s:
        hpaSpec:
          maxReplicas: 10
          minReplicas: 5
        serviceAnnotations:
          service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*"
istioctl install -f enable-proxy-protocol.yaml

output:

This will install the Istio 1.20.0 "default" profile (with components: Istio core, Istiod, and Ingress gateways) into the cluster. Proceed? (y/N) y
✔ Istio core installed                                                                                                                                                    
✔ Istiod installed                                                                                                                                                        
✔ Ingress gateways installed                                                                                                                                              
✔ Installation complete                                                                                                                                                   Made this installation the default for injection and validation.
  1. I have restarted the istio-system pods.
kubectl -n istio-system delete pods --all
  1. I sent an API request to the same endpoint as before
curl http://a681b3f3f054e4cddbc814a9d8477d38-1939553369.eu-central-1.elb.amazonaws.com/headers

output

{
  "headers": {
    "Accept": "*/*", 
    "Host": "a681b3f3f054e4cddbc814a9d8477d38-1939553369.eu-central-1.elb.amazonaws.com", 
    "User-Agent": "curl/7.81.0", 
    "X-B3-Parentspanid": "3db3aa3e2a1d92ce", 
    "X-B3-Sampled": "0", 
    "X-B3-Spanid": "ec7cdac78ce22af4", 
    "X-B3-Traceid": "54e8dc89c1f4163e3db3aa3e2a1d92ce", 
    "X-Envoy-Attempt-Count": "1", 
    "X-Envoy-External-Address": "XXX.XXX.XXX.XXX", 
    "X-Forwarded-Client-Cert": "By=spiffe://cluster.local/ns/foo/sa/httpbin;Hash=0ba9152cd166a2de5856f52bf11aac279457d28fb1462c4f76d5ded16b1330d0;Subject=\"\";URI=spiffe://cluster.local/ns/istio-system/sa/istio-ingressgateway-service-account"
  }
}

The IP address is available in X-Envoy-External-Address header.

Now that we know that Istio is handleing the source ip, I have deployed knative (as shown in previous comments) but when I curl the load balancer I get the upstream connect error or disconnect/reset before headers. reset reason: connection termination.

Could it be that knative-serving/knative-ingress-gateway specified in my Virtual Service is not proxy-protocol ready? If so, how can I enable it?

skonto commented 8 months ago

Could it be that knative-serving/knative-ingress-gateway specified in my Virtual Service is not proxy-protocol ready? If so, how can I enable it?

It should be enabled since you have that set globaly for all gateways. To set it for each gateway you can do (you have to double check with your Istio version):

metadata:
  annotations:
    "proxy.istio.io/config": '{"gatewayTopology" : { "proxyProtocol": {} }}'

I would recommend to use istioctl to check the configuration at the ingress gateway and compare it with the one before installing Knative. Could be that something is overridden.

1) You mentioned "X-Envoy-External-Address" works without Knative. Did you apply the envoy filter? 2) Could you please show how your Knative service behaves without applying the envoy filter and after you installed Knative? Can you access the ksvc if you target the ingress service?

jdiazper commented 8 months ago

Hi @skonto ,

  1. You mentioned "X-Envoy-External-Address" works without Knative. Did you apply the envoy filter?

It worked withouth the Envoy Filter.

  1. Could you please show how your Knative service behaves without applying the envoy filter and after you installed Knative? Can you access the ksvc if you target the ingress service?

I just ran a simple curl like:

curl -v -HHost:test.me http://a75311345845043c08caa3ec2390a77c-192380941.eu-central-1.elb.amazonaws.com/test

output

*   Trying 3.122.77.145:80...
* Connected to a75311345845043c08caa3ec2390a77c-192380941.eu-central-1.elb.amazonaws.com (3.122.77.145) port 80 (#0)
> GET /test HTTP/1.1
> Host:test.me
> User-Agent: curl/7.81.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< access-control-allow-credentials: true
< access-control-allow-origin: http://test.me
< content-length: 0
< content-type: text/html; charset=utf-8
< date: Wed, 07 Feb 2024 17:04:00 GMT
< server: istio-envoy
< x-envoy-upstream-service-time: 62
< 
* Connection #0 to host a75311345845043c08caa3ec2390a77c-192380941.eu-central-1.elb.amazonaws.com left intact

If I look at the logs I can see the printed headers.

[2024-02-07 17:04:00 +0000] [7] [DEBUG] Request Headers Host: backend.test-service.svc.cluster.local
User-Agent: curl/7.81.0
Accept: */*
Forwarded: for=10.0.14.83;proto=http, for=10.99.41.9, for=127.0.0.6
K-Proxy-Request: activator
X-B3-Parentspanid: 06dc0b522c9404be
X-B3-Sampled: 0
X-B3-Spanid: b68f05c3de8e79c2
X-B3-Traceid: 21706d1e61626fa95d0fd72775d02a88
X-Envoy-Attempt-Count: 1
X-Envoy-External-Address: 10.99.41.9
X-Forwarded-For: 10.0.14.83,10.99.41.9, 127.0.0.6, 127.0.0.6
X-Forwarded-Proto: http
X-Request-Id: 35e5a6f4-140b-4802-8a5a-42db031c56b5

127.0.0.1 - - [07/Feb/2024:17:04:00 +0000] 'GET /test HTTP/1.1' 200 0 '-' 'curl/7.81.0' in 18072µs

The ip address 10.99.41.9 belongs to the istio-ingressgateway pod.

However, when I enable proxy protocol, it doesn't reach the knative service and the logs from ingress gateway shows the 503 error but also that it is detecting the client ip (XXX.XXX.XXX.XXX).

{
  "response_code_details": "upstream_reset_before_response_started{connection_termination}",
  "bytes_sent": 95,
  "method": "GET",
  "upstream_transport_failure_reason": null,
  "path": "/test",
  "connection_termination_details": null,
  "upstream_service_time": null,
  "requested_server_name": null,
  "response_code": 503,
  "upstream_host": "10.99.139.99:8081",
  "duration": 1,
  "downstream_local_address": "10.0.48.78:80",
  "start_time": "2024-02-07T21:48:53.062Z",
  "upstream_local_address": "10.99.139.99:59898",
  "protocol": "HTTP/1.1",
  "x_forwarded_for": "XXX.XXX.XXX.XXX",
  "request_id": "58bb3760-f210-45dd-a9f4-26e47d246b7e",
  "downstream_remote_address": "XXX.XXX.XXX.XXX:65260",
  "route_name": null,
  "response_flags": "UC",
  "authority": "backend.test-service.svc.cluster.local",
  "user_agent": "curl/7.81.0",
  "bytes_received": 0,
  "upstream_cluster": "outbound|80||knative-local-gateway.istio-system.svc.cluster.local",
}

I have also applied the EnvoyFilter, but as soon as I apply it both services (httpbin and knative) stop working.

The last thing I have tried was to explicitly use the suggested annotation (I've looked at Istio documentation), but it doesn't make a difference.

skonto commented 7 months ago

I suggest you try first to see if the old guide here still works (it uses this part of Istio config). That guide relies on setting an envoy filter plus envoy's configuration here. The XFF stuff btw are quite complex but docs guide you of what to expect (Istio relies on envoy for header setting).

When you run the simple curl above you should get your machine's ip not the ingress one. Could you verify that in the istio ingress logs you see your remote client ip by default (the envoy filter affects ingress <-> envoy sidecar communications afaik)?

Also could you compare with mesh vs non-mesh mode (no sidecar injection).

If you face any issues with the guide above pls create a ticket at the Istio side to discuss further. When this works we can check again the Knative stuff. Also feel free to start a discussion on Knative slack channel so you can get more visibility and others can chime in.

I suspect your goal is to get the remote client's ip at the service side and not use the Proxy protocol end-to-end as described in here.

cc @dprotaso he may have more background info on EKS and Knative intgeration.

jdiazper commented 7 months ago

Hi @skonto ,

Thanks for all the links and guidence provided. I have been trying to make it work, but unfortunatelly I still keep receiveing that upstream connect error or disconnect/reset before headers. reset reason: connection termination.

@dprotaso welcome to this issue!

Based on your experience, do you know if I am missing anything in my installation/configuration that makes the gateway to return that error?

As always, thanks for your help and support!

jdiazper commented 6 months ago

Hi @dprotaso ,

Is there any chance you can look at this issue? I would really appreciate any advice you could give me to get it resolved.

Thanks!

github-actions[bot] commented 3 months ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

skonto commented 3 months ago

/remove-lifecycle stale

skonto commented 3 months ago

Hi @jdiazper sorry for the delay, not many cycles :( Could you add some more info as mentioned in https://github.com/knative/serving/issues/14718#issuecomment-1935700032?

cc @ReToCode

github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.