Open dspeck1 opened 5 months ago
@dspeck1 thanks for creating this issue! What values.yaml are you using to create the vCluster? Knative should work when installed inside the vCluster, the plugin is just there if you don't want to install it in every vCluster.
Thanks for helping! Below is config. Please let me know if anything sticks out on the sync settings. Wondering if reconciliation between the parent cluster and the vcluster could be a cause of the issue since the pods are created and torn down frequently in Knative.
monitoring:
serviceMonitor:
enabled: false
enableHA: true
sync:
services:
enabled: true
configmaps:
enabled: true
all: false
secrets:
enabled: true
all: false
endpoints:
enabled: true
pods:
enabled: true
ephemeralContainers: false
status: false
events:
enabled: true
persistentvolumeclaims:
enabled: true
ingresses:
enabled: true
ingressclasses:
# By default IngressClasses sync is enabled when the Ingress sync is enabled
# but it can be explicitly disabled by setting:
enabled: false
fake-nodes:
enabled: false
fake-persistentvolumes:
enabled: false
nodes:
fakeKubeletIPs: false
enabled: true
# If nodes sync is enabled, and syncAllNodes = true, the virtual cluster
# will sync all nodes instead of only the ones where some pods are running.
syncAllNodes: true
# nodeSelector is used to limit which nodes get synced to the vcluster,
# and which nodes are used to run vcluster pods.
# A valid string representation of a label selector must be used.
# if true, vcluster will run with a scheduler and node changes are possible
# from within the virtual cluster. This is useful if you would like to
# taint, drain and label nodes from within the virtual cluster
enableScheduler: false
# DEPRECATED: use enable scheduler instead
# syncNodeChanges allows vcluster user edits of the nodes to be synced down to the host nodes.
# Write permissions on node resource will be given to the vcluster.
syncNodeChanges: false
persistentvolumes:
enabled: false
storageclasses:
enabled: false
# formerly named - "legacy-storageclasses"
hoststorageclasses:
enabled: true
priorityclasses:
enabled: false
networkpolicies:
enabled: false
volumesnapshots:
enabled: false
poddisruptionbudgets:
enabled: false
serviceaccounts:
enabled: false
# If enabled, will fallback to host dns for resolving domains. This
# is useful if using istio or dapr in the host cluster and sidecar
# containers cannot connect to the central instance. Its also useful
# if you want to access host cluster services from within the vcluster.
fallbackHostDns: false
# Map Services between host and virtual cluster
mapServices:
# Services that should get mapped from the
# virtual cluster to the host cluster.
# vcluster will make sure to sync the service
# ip to the host cluster automatically as soon
# as the service exists.
# For example:
# fromVirtual:
# - from: my-namespace/name
# to: host-service
fromVirtual: []
# Same as from virtual, but instead sync services
# from the host cluster into the virtual cluster.
# If the namespace does not exist, vcluster will
# also create the namespace for the service.
fromHost: []
proxy:
metricsServer:
nodes:
enabled: false
pods:
enabled: false
# Syncer configuration
syncer:
# Image to use for the syncer
# image: ghcr.io/loft-sh/vcluster
imagePullPolicy: ""
extraArgs: []
volumeMounts: []
extraVolumeMounts: []
env: []
livenessProbe:
enabled: true
readinessProbe:
enabled: true
resources:
limits:
ephemeral-storage: 8Gi
cpu: 1000m
memory: 512Mi
requests:
ephemeral-storage: 200Mi
# ensure that cpu/memory requests are high enough.
# for example gke wants minimum 10m/32Mi here!
cpu: 20m
memory: 64Mi
# Extra volumes
volumes: []
# The amount of replicas to run the deployment with
replicas: 3
# Affinity to apply to the syncer deployment
affinity: {}
# Extra Labels for the syncer deployment
labels: {}
# Extra Annotations for the syncer deployment
annotations: {}
podAnnotations: {}
podLabels: {}
priorityClassName: ""
kubeConfigContextName: ""
# Security context configuration
securityContext: {}
podSecurityContext: {}
serviceAnnotations: {}
# Etcd settings
etcd:
image: registry.k8s.io/etcd:3.5.12-0
imagePullPolicy: ""
# The amount of replicas to run
replicas: 3
# Affinity to apply to the syncer deployment
affinity: {}
# Extra Labels
labels: {}
# Extra Annotations
annotations: {}
podAnnotations: {}
podLabels: {}
resources:
requests:
cpu: 20m
memory: 150Mi
# Storage settings for the etcd
storage:
# If this is disabled, vcluster will use an emptyDir instead
# of a PersistentVolumeClaim
persistence: true
# Size of the persistent volume claim
size: 5Gi
# Optional StorageClass used for the pvc
# if empty default StorageClass defined in your host cluster will be used
className: <removed>
priorityClassName: ""
securityContext: {}
serviceAnnotations: {}
autoDeletePersistentVolumeClaims: true
# Kubernetes Controller Manager settings
controller:
image: registry.k8s.io/kube-controller-manager:v1.27.10
imagePullPolicy: ""
# The amount of replicas to run the deployment with
replicas: 3
# Affinity to apply to the syncer deployment
affinity: {}
# Extra Labels
labels: {}
# Extra Annotations
annotations: {}
podAnnotations: {}
podLabels: {}
resources:
requests:
cpu: 15m
priorityClassName: ""
securityContext: {}
# Kubernetes Scheduler settings. Only enabled if sync.nodes.enableScheduler is true
scheduler:
image: registry.k8s.io/kube-scheduler:v1.27.10
imagePullPolicy: ""
# The amount of replicas to run the deployment with
replicas: 3
# Affinity to apply to the syncer deployment
affinity: {}
# Extra Labels
labels: {}
# Extra Annotations
annotations: {}
podAnnotations: {}
podLabels: {}
resources:
requests:
cpu: 10m
priorityClassName: ""
# Kubernetes API Server settings
api:
image: registry.k8s.io/kube-apiserver:v1.27.10
imagePullPolicy: ""
extraArgs:
- <removed>
# NodeSelector used to schedule the syncer
replicas: 3
# Affinity to apply to the syncer deployment
affinity: {}
# Extra Labels for the syncer deployment
labels: {}
# Extra Annotations for the syncer deployment
annotations: {}
podAnnotations: {}
podLabels: {}
resources:
requests:
cpu: 40m
memory: 300Mi
priorityClassName: ""
securityContext: {}
serviceAnnotations: {}
# Service account that should be used by the vcluster
serviceAccount:
create: true
# Optional name of the service account to use
# name: default
# Optional pull secrets
# imagePullSecrets:
# - name: my-pull-secret
# Service account that should be used by the pods synced by vcluster
workloadServiceAccount:
# This is not supported in multi-namespace mode
annotations: {}
# Roles & ClusterRoles for the vcluster
rbac:
clusterRole:
# Deprecated !
# Necessary cluster roles are created based on the enabled syncers (.sync.*.enabled)
# Support for this value will be removed in a future version of the vcluster
create: false
role:
# Deprecated !
# Support for this value will be removed in a future version of the vcluster
# and basic role will always be created
create: true
# Deprecated !
# Necessary extended roles are created based on the enabled syncers (.sync.*.enabled)
# Support for this value will be removed in a future version of the vcluster
extended: false
# all entries in excludedApiResources will be excluded from the Role created for vcluster
excludedApiResources:
# - pods/exec
# Syncer service configurations
service:
type: ClusterIP
# Optional configuration
# A list of IP addresses for which nodes in the cluster will also accept traffic for this service.
# These IPs are not managed by Kubernetes; e.g., an external load balancer.
externalIPs: []
# Optional configuration for LoadBalancer & NodePort service types
# Route external traffic to node-local or cluster-wide endpoints [ Local | Cluster ]
externalTrafficPolicy: ""
# Optional configuration for LoadBalancer service type
# Specify IP of load balancer to be created
loadBalancerIP: ""
# CIDR block(s) for the service allowlist
loadBalancerSourceRanges: []
# Set the loadBalancerClass if using an external load balancer controller
loadBalancerClass: ""
# Configure the ingress resource that allows you to access the vcluster
ingress:
# Enable ingress record generation
enabled: false
# Ingress path type
pathType: ImplementationSpecific
ingressClassName: ""
host: vcluster.local
annotations:
nginx.ingress.kubernetes.io/backend-protocol: HTTPS
nginx.ingress.kubernetes.io/ssl-passthrough: "true"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
# Ingress TLS configuration
tls: []
# - secretName: tls-vcluster.local
# hosts:
# - vcluster.local
# Set "enable" to true when running vcluster in an OpenShift host
# This will add an extra rule to the deployed role binding in order
# to manage service endpoints
openshift:
enable: false
# If enabled will deploy the coredns configmap
coredns:
integrated: false
enabled: true
plugin:
enabled: false
config: []
# example configuration for plugin syntax, will be documented in detail
# - record:
# fqdn: google.com
# target:
# mode: url
# url: google.co.in
# - record:
# service: my-namespace/my-svc # dns-test/nginx-svc
# target:
# mode: host
# service: dns-test/nginx-svc
# - record:
# service: my-namespace-lb/my-svc-lb
# target:
# mode: host
# service: dns-test-exposed-lb/nginx-svc-exposed-lb
# - record:
# service: my-ns-external-name/my-svc-external-name
# target:
# mode: host
# service: dns-test-external-name/nginx-svc-external-name
# - record:
# service: my-ns-in-vcluster/my-svc-vcluster
# target:
# mode: vcluster # can be tested only manually for now
# vcluster: test-vcluster-ns/test-vcluster
# service: dns-test-in-vcluster-ns/test-in-vcluster-service
# - record:
# service: my-ns-in-vcluster-mns/my-svc-mns
# target:
# mode: vcluster # can be tested only manually for now
# service: dns-test-in-vcluster-mns/test-in-vcluster-svc-mns
# vcluster: test-vcluster-ns-mns/test-vcluster-mns
# - record:
# service: my-self-vc-ns/my-self-vc-svc
# target:
# mode: self
# service: dns-test/nginx-svc
replicas: 3
# The nodeSelector example below specifices that coredns should only be scheduled to nodes with the arm64 label
# nodeSelector:
# kubernetes.io/arch: arm64
# image: my-core-dns-image:latest
# config: |-
# .:1053 {
# ...
# CoreDNS service configurations
service:
type: ClusterIP
# Configuration for LoadBalancer service type
externalIPs: []
externalTrafficPolicy: ""
# Extra Annotations
annotations: {}
resources:
limits:
cpu: 1000m
memory: 170Mi
requests:
cpu: 3m
memory: 16Mi
# if below option is configured, it will override the coredns manifests with the following string
# manifests: |-
# apiVersion: ...
# ...
podAnnotations: {}
podLabels: {}
# If enabled will deploy vcluster in an isolated mode with pod security
# standards, limit ranges and resource quotas
isolation:
enabled: false
namespace: null
podSecurityStandard: baseline
# If enabled will add node/proxy permission to the cluster role
# in isolation mode
nodeProxyPermission:
enabled: false
resourceQuota:
enabled: true
quota:
requests.cpu: 10
requests.memory: 20Gi
requests.storage: "100Gi"
requests.ephemeral-storage: 60Gi
limits.cpu: 20
limits.memory: 40Gi
limits.ephemeral-storage: 160Gi
services.nodeports: 0
services.loadbalancers: 1
count/endpoints: 40
count/pods: 20
count/services: 20
count/secrets: 100
count/configmaps: 100
count/persistentvolumeclaims: 20
scopeSelector:
matchExpressions:
scopes:
limitRange:
enabled: true
default:
ephemeral-storage: 8Gi
memory: 512Mi
cpu: "1"
defaultRequest:
ephemeral-storage: 3Gi
memory: 128Mi
cpu: 100m
networkPolicy:
enabled: true
outgoingConnections:
ipBlock:
cidr: 0.0.0.0/0
except:
- 100.64.0.0/10
- 127.0.0.0/8
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
# manifests to setup when initializing a vcluster
init:
manifests: |-
---
# The contents of manifests-template will be templated using helm
# this allows you to use helm values inside, e.g.: {{ .Release.Name }}
manifestsTemplate: ''
helm: []
# - bundle: <string> - base64-encoded .tar.gz file content (optional - overrides chart.repo)
# chart:
# name: <string> REQUIRED
# version: <string> REQUIRED
# repo: <string> (optional when bundle is used)
# username: <string> (if required for repo)
# password: <string> (if required for repo)
# insecure: boolean (if required for repo)
# release:
# name: <string> REQUIRED
# namespace: <string> REQUIRED
# timeout: number
# values: |- string YAML object
# foo: bar
# valuesTemplate: |- string YAML object
# foo: {{ .Release.Name }}
multiNamespaceMode:
enabled: false
# list of {validating/mutating}webhooks that the syncer should proxy.
# This is a PRO only feature.
admission:
validatingWebhooks: []
mutatingWebhooks: []
telemetry:
disabled: true
instanceCreator: "helm"
platformUserID: ""
platformInstanceID: ""
machineID: ""
I can confirm knative works inside vcluster. We have knative + istio installed inside vclusters. Are you running the knative controllers in the host or vclusters?
Knative Controllers are inside the vCluster. We are running kourier as ingress inside vCluster. Knative works for 95% of requests. Intermittently we see the terminiation/timeout messages detailed above. The requests are all long lived http requests. ~5 to 10 minutes.
@dspeck1 According to the error stack trace you posted, this was caused by the timeout handler in knative’s queue-proxy sidecar, so I’d suggest opening an issue over at knative/serving. Since 5-10min is quite long for an http request, it’s quite possible you are hitting a timeout in queue-proxy.
What happened?
Does Knative Serving work and supported inside of a vCluster? Knative Serving will install and we can serve traffic. We see sporadic issues where queue-proxy/user-container pods will fail with no apparent cause and return a 502. Noticed that there is a plugin to sync resources from the parent cluster? Is this because Knative isn't designed to work installed inside of a vCluster. We are using Kourier as our ingress installed inside the vCluster?
What did you expect to happen?
Knative to serve traffic reliably installed inside a vCluster.
How can we reproduce it (as minimally and precisely as possible)?
Install Knative inside a vCluster.
Anything else we need to know?
Termination message is:
Host cluster Kubernetes version
Server Version: v1.27.10
Host cluster Kubernetes distribution
Open Source
vlcuster version
0.18.1
Vcluster Kubernetes distribution(k3s(default)), k8s, k0s)
k8s
OS and Arch
OS: Red Hat Linux Arch: x86