vCluster Knative Serving Support?

dspeck1 commented 5 months ago

What happened?

Does Knative Serving work and supported inside of a vCluster? Knative Serving will install and we can serve traffic. We see sporadic issues where queue-proxy/user-container pods will fail with no apparent cause and return a 502. Noticed that there is a plugin to sync resources from the parent cluster? Is this because Knative isn't designed to work installed inside of a vCluster. We are using Kourier as our ingress installed inside the vCluster?

What did you expect to happen?

Knative to serve traffic reliably installed inside a vCluster.

How can we reproduce it (as minimally and precisely as possible)?

Install Knative inside a vCluster.

Anything else we need to know?

Termination message is:

2024-04-05 00:51:05.020 {"severity":"ERROR","timestamp":"2024-04-05T03:51:05.020150643Z","logger":"queueproxy","caller":"network/error_handler.go:33","message":"error reverse proxying request; sockstat: sockets: used 4\nTCP: inuse 2 orphan 2 tw 6 alloc 1618 mem 116\nUDP: inuse 0 mem 8\nUDPLITE: inuse 0\nRAW: inuse 0\nFRAG: inuse 0 memory 0\n","commit":"f1bd929","knative.dev/key":"prompt-proto-service-latiss/prompt-proto-service-latiss-00079","knative.dev/pod":"prompt-proto-service-latiss-00079-deployment-558bf94d49-bmd4k","error":"EOF","stacktrace":"knative.dev/pkg/network.ErrorHandler.func1\n\tknative.dev/pkg@v0.0.0-20231023151236-29775d7c9e5c/network/error_handler.go:33\nnet/http/httputil.(*ReverseProxy).ServeHTTP\n\tnet/http/httputil/reverseproxy.go:475\nknative.dev/serving/pkg/queue.(*appRequestMetricsHandler).ServeHTTP\n\tknative.dev/serving/pkg/queue/request_metric.go:199\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ProxyHandler.func3.2\n\tknative.dev/serving/pkg/queue/handler.go:65\nknative.dev/serving/pkg/queue.(*Breaker).Maybe\n\tknative.dev/serving/pkg/queue/breaker.go:155\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ProxyHandler.func3\n\tknative.dev/serving/pkg/queue/handler.go:63\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2136\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ForwardedShimHandler.func4\n\tknative.dev/serving/pkg/queue/forwarded_shim.go:54\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2136\nknative.dev/serving/pkg/http/handler.(*timeoutHandler).ServeHTTP.func4\n\tknative.dev/serving/pkg/http/handler/timeout.go:118"}

Host cluster Kubernetes version

Server Version: v1.27.10

Host cluster Kubernetes distribution

Open Source

vlcuster version

0.18.1

Vcluster Kubernetes distribution(k3s(default)), k8s, k0s)

k8s

OS and Arch

OS: Red Hat Linux Arch: x86

FabianKramm commented 5 months ago

@dspeck1 thanks for creating this issue! What values.yaml are you using to create the vCluster? Knative should work when installed inside the vCluster, the plugin is just there if you don't want to install it in every vCluster.

dspeck1 commented 5 months ago

Thanks for helping! Below is config. Please let me know if anything sticks out on the sync settings. Wondering if reconciliation between the parent cluster and the vcluster could be a cause of the issue since the pods are created and torn down frequently in Knative.

monitoring:
  serviceMonitor:
    enabled: false

enableHA: true

sync:
  services:
    enabled: true
  configmaps:
    enabled: true
    all: false
  secrets:
    enabled: true
    all: false
  endpoints:
    enabled: true
  pods:
    enabled: true
    ephemeralContainers: false
    status: false
  events:
    enabled: true
  persistentvolumeclaims:
    enabled: true
  ingresses:
    enabled: true
  ingressclasses:
    # By default IngressClasses sync is enabled when the Ingress sync is enabled
    # but it can be explicitly disabled by setting:
    enabled: false
  fake-nodes:
    enabled: false
  fake-persistentvolumes:
    enabled: false
  nodes:
    fakeKubeletIPs: false
    enabled: true
    # If nodes sync is enabled, and syncAllNodes = true, the virtual cluster
    # will sync all nodes instead of only the ones where some pods are running.
    syncAllNodes: true
    # nodeSelector is used to limit which nodes get synced to the vcluster,
    # and which nodes are used to run vcluster pods.
    # A valid string representation of a label selector must be used.
    # if true, vcluster will run with a scheduler and node changes are possible
    # from within the virtual cluster. This is useful if you would like to
    # taint, drain and label nodes from within the virtual cluster
    enableScheduler: false
    # DEPRECATED: use enable scheduler instead
    # syncNodeChanges allows vcluster user edits of the nodes to be synced down to the host nodes.
    # Write permissions on node resource will be given to the vcluster.
    syncNodeChanges: false
  persistentvolumes:
    enabled: false
  storageclasses:
    enabled: false
  # formerly named - "legacy-storageclasses"
  hoststorageclasses:
    enabled: true
  priorityclasses:
    enabled: false
  networkpolicies:
    enabled: false
  volumesnapshots:
    enabled: false
  poddisruptionbudgets:
    enabled: false
  serviceaccounts:
    enabled: false

# If enabled, will fallback to host dns for resolving domains. This
# is useful if using istio or dapr in the host cluster and sidecar
# containers cannot connect to the central instance. Its also useful
# if you want to access host cluster services from within the vcluster.
fallbackHostDns: false

# Map Services between host and virtual cluster
mapServices:
  # Services that should get mapped from the
  # virtual cluster to the host cluster.
  # vcluster will make sure to sync the service
  # ip to the host cluster automatically as soon
  # as the service exists.
  # For example:
  # fromVirtual:
  #   - from: my-namespace/name
  #     to: host-service
  fromVirtual: []
  # Same as from virtual, but instead sync services
  # from the host cluster into the virtual cluster.
  # If the namespace does not exist, vcluster will
  # also create the namespace for the service.
  fromHost: []

proxy:
  metricsServer:
    nodes:
      enabled: false
    pods:
      enabled: false

# Syncer configuration
syncer:
  # Image to use for the syncer
  # image: ghcr.io/loft-sh/vcluster
  imagePullPolicy: ""
  extraArgs: []
  volumeMounts: []
  extraVolumeMounts: []
  env: []
  livenessProbe:
    enabled: true
  readinessProbe:
    enabled: true
  resources:
    limits:
      ephemeral-storage: 8Gi
      cpu: 1000m
      memory: 512Mi
    requests:
      ephemeral-storage: 200Mi
      # ensure that cpu/memory requests are high enough.
      # for example gke wants minimum 10m/32Mi here!
      cpu: 20m
      memory: 64Mi
  # Extra volumes
  volumes: []
  # The amount of replicas to run the deployment with
  replicas: 3
  # Affinity to apply to the syncer deployment
  affinity: {}
  # Extra Labels for the syncer deployment
  labels: {}
  # Extra Annotations for the syncer deployment
  annotations: {}
  podAnnotations: {}
  podLabels: {}
  priorityClassName: ""
  kubeConfigContextName: ""
  # Security context configuration
  securityContext: {}
  podSecurityContext: {}
  serviceAnnotations: {}

# Etcd settings
etcd:
  image: registry.k8s.io/etcd:3.5.12-0
  imagePullPolicy: ""
  # The amount of replicas to run
  replicas: 3
  # Affinity to apply to the syncer deployment
  affinity: {}
  # Extra Labels
  labels: {}
  # Extra Annotations
  annotations: {}
  podAnnotations: {}
  podLabels: {}
  resources:
    requests:
      cpu: 20m
      memory: 150Mi
  # Storage settings for the etcd
  storage:
    # If this is disabled, vcluster will use an emptyDir instead
    # of a PersistentVolumeClaim
    persistence: true
    # Size of the persistent volume claim
    size: 5Gi
    # Optional StorageClass used for the pvc
    # if empty default StorageClass defined in your host cluster will be used
    className: <removed>
  priorityClassName: ""
  securityContext: {}
  serviceAnnotations: {}
  autoDeletePersistentVolumeClaims: true

# Kubernetes Controller Manager settings
controller:
  image: registry.k8s.io/kube-controller-manager:v1.27.10
  imagePullPolicy: ""
  # The amount of replicas to run the deployment with
  replicas: 3
  # Affinity to apply to the syncer deployment
  affinity: {}
  # Extra Labels
  labels: {}
  # Extra Annotations
  annotations: {}
  podAnnotations: {}
  podLabels: {}
  resources:
    requests:
      cpu: 15m
  priorityClassName: ""
  securityContext: {}
# Kubernetes Scheduler settings. Only enabled if sync.nodes.enableScheduler is true
scheduler:
  image: registry.k8s.io/kube-scheduler:v1.27.10
  imagePullPolicy: ""
  # The amount of replicas to run the deployment with
  replicas: 3
  # Affinity to apply to the syncer deployment
  affinity: {}
  # Extra Labels
  labels: {}
  # Extra Annotations
  annotations: {}
  podAnnotations: {}
  podLabels: {}
  resources:
    requests:
      cpu: 10m
  priorityClassName: ""

# Kubernetes API Server settings
api:
  image: registry.k8s.io/kube-apiserver:v1.27.10
  imagePullPolicy: ""
  extraArgs:
  - <removed>
  # NodeSelector used to schedule the syncer
  replicas: 3
  # Affinity to apply to the syncer deployment
  affinity: {}
  # Extra Labels for the syncer deployment
  labels: {}
  # Extra Annotations for the syncer deployment
  annotations: {}
  podAnnotations: {}
  podLabels: {}
  resources:
    requests:
      cpu: 40m
      memory: 300Mi
  priorityClassName: ""
  securityContext: {}
  serviceAnnotations: {}

# Service account that should be used by the vcluster
serviceAccount:
  create: true
  # Optional name of the service account to use
  # name: default
  # Optional pull secrets
  # imagePullSecrets:
  #   - name: my-pull-secret

# Service account that should be used by the pods synced by vcluster
workloadServiceAccount:
  # This is not supported in multi-namespace mode
  annotations: {}

# Roles & ClusterRoles for the vcluster
rbac:
  clusterRole:
    # Deprecated !
    # Necessary cluster roles are created based on the enabled syncers (.sync.*.enabled)
    # Support for this value will be removed in a future version of the vcluster
    create: false
  role:
    # Deprecated !
    # Support for this value will be removed in a future version of the vcluster
    # and basic role will always be created
    create: true
    # Deprecated !
    # Necessary extended roles are created based on the enabled syncers (.sync.*.enabled)
    # Support for this value will be removed in a future version of the vcluster
    extended: false
    # all entries in excludedApiResources will be excluded from the Role created for vcluster
    excludedApiResources:
      # - pods/exec

# Syncer service configurations
service:
  type: ClusterIP

  # Optional configuration
  # A list of IP addresses for which nodes in the cluster will also accept traffic for this service.
  # These IPs are not managed by Kubernetes; e.g., an external load balancer.
  externalIPs: []

  # Optional configuration for LoadBalancer & NodePort service types
  # Route external traffic to node-local or cluster-wide endpoints [ Local | Cluster ]
  externalTrafficPolicy: ""

  # Optional configuration for LoadBalancer service type
  # Specify IP of load balancer to be created
  loadBalancerIP: ""
  # CIDR block(s) for the service allowlist
  loadBalancerSourceRanges: []
  # Set the loadBalancerClass if using an external load balancer controller
  loadBalancerClass: ""

# Configure the ingress resource that allows you to access the vcluster
ingress:
  # Enable ingress record generation
  enabled: false
  # Ingress path type
  pathType: ImplementationSpecific
  ingressClassName: ""
  host: vcluster.local
  annotations:
    nginx.ingress.kubernetes.io/backend-protocol: HTTPS
    nginx.ingress.kubernetes.io/ssl-passthrough: "true"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
  # Ingress TLS configuration
  tls: []
    # - secretName: tls-vcluster.local
    #   hosts:
    #     - vcluster.local

# Set "enable" to true when running vcluster in an OpenShift host
# This will add an extra rule to the deployed role binding in order
# to manage service endpoints
openshift:
  enable: false

# If enabled will deploy the coredns configmap
coredns:
  integrated: false
  enabled: true
  plugin:
    enabled: false
    config: []
    # example configuration for plugin syntax, will be documented in detail
    # - record:
    #     fqdn: google.com
    #   target:
    #     mode: url
    #     url: google.co.in
    # - record:
    #     service: my-namespace/my-svc    # dns-test/nginx-svc
    #   target:
    #     mode: host
    #     service: dns-test/nginx-svc
    # - record:
    #     service: my-namespace-lb/my-svc-lb
    #   target:
    #     mode: host
    #     service: dns-test-exposed-lb/nginx-svc-exposed-lb
    # - record:
    #     service: my-ns-external-name/my-svc-external-name
    #   target:
    #     mode: host
    #     service: dns-test-external-name/nginx-svc-external-name
    # - record:
    #     service: my-ns-in-vcluster/my-svc-vcluster
    #   target:
    #     mode: vcluster              # can be tested only manually for now
    #     vcluster: test-vcluster-ns/test-vcluster
    #     service: dns-test-in-vcluster-ns/test-in-vcluster-service
    # - record:
    #     service: my-ns-in-vcluster-mns/my-svc-mns
    #   target:
    #     mode: vcluster              # can be tested only manually for now
    #     service: dns-test-in-vcluster-mns/test-in-vcluster-svc-mns
    #     vcluster: test-vcluster-ns-mns/test-vcluster-mns
    # - record:
    #     service: my-self-vc-ns/my-self-vc-svc
    #   target:
    #     mode: self
    #     service: dns-test/nginx-svc
  replicas: 3
  # The nodeSelector example below specifices that coredns should only be scheduled to nodes with the arm64 label
  # nodeSelector:
  #   kubernetes.io/arch: arm64
  # image: my-core-dns-image:latest
  # config: |-
  #   .:1053 {
  #      ...
  # CoreDNS service configurations
  service:
    type: ClusterIP
    # Configuration for LoadBalancer service type
    externalIPs: []
    externalTrafficPolicy: ""
    # Extra Annotations
    annotations: {}
  resources:
    limits:
      cpu: 1000m
      memory: 170Mi
    requests:
      cpu: 3m
      memory: 16Mi
# if below option is configured, it will override the coredns manifests with the following string
#  manifests: |-
#    apiVersion: ...
#    ...
  podAnnotations: {}
  podLabels: {}

# If enabled will deploy vcluster in an isolated mode with pod security
# standards, limit ranges and resource quotas
isolation:
  enabled: false
  namespace: null

  podSecurityStandard: baseline

  # If enabled will add node/proxy permission to the cluster role
  # in isolation mode
  nodeProxyPermission:
    enabled: false

  resourceQuota:
    enabled: true
    quota:
      requests.cpu: 10
      requests.memory: 20Gi
      requests.storage: "100Gi"
      requests.ephemeral-storage: 60Gi
      limits.cpu: 20
      limits.memory: 40Gi
      limits.ephemeral-storage: 160Gi
      services.nodeports: 0
      services.loadbalancers: 1
      count/endpoints: 40
      count/pods: 20
      count/services: 20
      count/secrets: 100
      count/configmaps: 100
      count/persistentvolumeclaims: 20
    scopeSelector:
      matchExpressions:
    scopes:

  limitRange:
    enabled: true
    default:
      ephemeral-storage: 8Gi
      memory: 512Mi
      cpu: "1"
    defaultRequest:
      ephemeral-storage: 3Gi
      memory: 128Mi
      cpu: 100m

  networkPolicy:
    enabled: true
    outgoingConnections:
      ipBlock:
        cidr: 0.0.0.0/0
        except:
          - 100.64.0.0/10
          - 127.0.0.0/8
          - 10.0.0.0/8
          - 172.16.0.0/12
          - 192.168.0.0/16

# manifests to setup when initializing a vcluster
init:
  manifests: |-
    ---
  # The contents of manifests-template will be templated using helm
  # this allows you to use helm values inside, e.g.: {{ .Release.Name }}
  manifestsTemplate: ''
  helm: []
    # - bundle: <string> - base64-encoded .tar.gz file content (optional - overrides chart.repo)
    #   chart:
    #     name: <string>  REQUIRED
    #     version: <string>  REQUIRED
    #     repo: <string>  (optional when bundle is used)
    #     username: <string>   (if required for repo)
    #     password: <string>   (if required for repo)
    #     insecure: boolean    (if required for repo)
    #   release:
    #     name: <string> REQUIRED
    #     namespace: <string> REQUIRED
    #     timeout: number
    #   values: |-  string YAML object
    #     foo: bar
    #   valuesTemplate: |-  string YAML object
    #     foo: {{ .Release.Name }}

multiNamespaceMode:
  enabled: false

# list of {validating/mutating}webhooks that the syncer should proxy.
# This is a PRO only feature.
admission:
  validatingWebhooks: []
  mutatingWebhooks: []

telemetry:
  disabled: true
  instanceCreator: "helm"
  platformUserID: ""
  platformInstanceID: ""
  machineID: ""

cezar-guimaraes commented 5 months ago

I can confirm knative works inside vcluster. We have knative + istio installed inside vclusters. Are you running the knative controllers in the host or vclusters?

dspeck1 commented 5 months ago

Knative Controllers are inside the vCluster. We are running kourier as ingress inside vCluster. Knative works for 95% of requests. Intermittently we see the terminiation/timeout messages detailed above. The requests are all long lived http requests. ~5 to 10 minutes.

cezarguimaraes commented 5 months ago

@dspeck1 According to the error stack trace you posted, this was caused by the timeout handler in knative’s queue-proxy sidecar, so I’d suggest opening an issue over at knative/serving. Since 5-10min is quite long for an http request, it’s quite possible you are hitting a timeout in queue-proxy.

loft-sh / vcluster