loft-sh / vcluster

vCluster - Create fully functional virtual Kubernetes clusters - Each vcluster runs inside a namespace of the underlying k8s cluster. It's cheaper than creating separate full-blown clusters and it offers better multi-tenancy and isolation than regular namespaces.
https://www.vcluster.com
Apache License 2.0
6.47k stars 412 forks source link

Syncing StatefulSet fails due to 'forbidden: failed quota' #179

Closed MShekow closed 2 years ago

MShekow commented 2 years ago

I'm unable to install a Helm chart in the virtual cluster. The chart installs two components: a Deployment (whose Pod is synced to the host cluster successfully), and a StatefulSet. The host cluster requires me to provide limits/requests, and also has resource quotas set.

My Helm chart sets both requests and limits for the StatefulSet, as the following snippet illustrates:

apiVersion: v1
kind: Pod
metadata:
  name: review-release-pmtk-pm4pyws-0
  namespace: ft-dummy-feature
  labels:
    component: pm4py-ws
    statefulset.kubernetes.io/pod-name: review-release-pmtk-pm4pyws-0
spec:
  volumes:
    ...
  containers:
    - name: my-image
      image: >-
        myimage:latest
      ports:
        ...
      resources:
        limits:
          cpu: '1'
          memory: 512Mi
        requests:
          cpu: '1'
          memory: 512Mi
      volumeMounts:
       ...
      imagePullPolicy: Always
  restartPolicy: Always

Unfortunately, vcluster fails to sync this pod to the host cluster, this is the full error message: E1105 11:23:32.543040 1 controller.go:302] controller-runtime: manager: reconciler group reconciler kind Pod: controller: pod-forward: name review-release-pmtk-pm4pyws-0 namespace ft-dummy-feature: Reconciler error pods "review-release-pmtk-pm4pyws-0-x-ft-dummy-feature-x-v" is forbidden: failed quota: default-tsvb6: must specify limits.cpu,limits.memory,requests.cpu,requests.memory

When retrieving information about the default-tsvb6 ResourceQuota (in the host cluster's namespace), I get this:

status:
  hard:
    limits.cpu: '8'
    limits.memory: 32Gi
    requests.cpu: '4'
    requests.memory: 16Gi
    requests.storage: 25Gi
  used:
    limits.cpu: 2250m
    limits.memory: 2346Mi
    requests.cpu: 1850m
    requests.memory: 1222Mi
    requests.storage: 11Gi
spec:
  hard:
    limits.cpu: '8'
    limits.memory: 32Gi
    requests.cpu: '4'
    requests.memory: 16Gi
    requests.storage: 25Gi
FabianKramm commented 2 years ago

@MShekow thanks a lot for this issue! I just tested this and it works fine for me, maybe there is a sidecar container injected that is missing the resources. You can also create a limit range in the host cluster that will set the missing limits automatically if not specified.

MShekow commented 2 years ago

@FabianKramm I cannot explain what the cause is. I can install the following manifest just fine into the host cluster (into my limited namespace), but installing it into the vcluster will produce the error I mentioned above:

apiVersion: v1
kind: Service
metadata:
  name: test-pmtk-pm4pyws-headless
spec:
  type: ClusterIP
  ports:
    - port: 8080
      targetPort: 8080
      protocol: TCP
      name: http
  clusterIP: None
  selector:
    component: pm4py-ws
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: test-pmtk-pm4pyws
spec:
  replicas: 1
  selector:
    matchLabels:
      component: pm4py-ws
  serviceName: test-pmtk-pm4pyws-headless
  template:
    metadata:
      labels:
        component: pm4py-ws
    spec:
      containers:
        - name: pm4py-ws
          image: "nginxinc/nginx-unprivileged:1.20.1-alpine"
          imagePullPolicy: Always
          resources:
            limits:
              cpu: 1000m
              memory: 512Mi
            requests:
              cpu: 1000m
              memory: 512Mi

When removing the limits block, the same error is shown. I also cannot test creating a LimitRange in the host cluster, due to lacking permissions.

Note: if, in the above example, I replace StatefulSet with Deployment, and comment out the serviceName (to have valid syntax), the pod is synced successfully. I'm fully aware that this makes no sense whatsoever...

We are using RKE, Kubernetes Version: v1.20.4

matskiv commented 2 years ago

I suspect this was caused by the "vcluster-rewrite-hosts" sidecar that vcluster injects. That was fixed in #400 PR and released in v0.7.0.

Please let us know if you still observer the issue in v0.7.0+