loft-sh / vcluster

vCluster - Create fully functional virtual Kubernetes clusters - Each vcluster runs inside a namespace of the underlying k8s cluster. It's cheaper than creating separate full-blown clusters and it offers better multi-tenancy and isolation than regular namespaces.
https://www.vcluster.com
Apache License 2.0
6.28k stars 399 forks source link

v0.20.0 integrations metricsServer cannot work properly #2092

Open wutz opened 3 weeks ago

wutz commented 3 weeks ago

What happened?

Enable integrations metricsServer, but kubectl top node does not work properly in vCluster.

What did you expect to happen?

kubectl top node works normally.

How can we reproduce it (as minimally and precisely as possible)?

integrations:
  metricsServer:
    enabled: true
    nodes: true
    pods: true

use vCluster v0.20.0

Anything else we need to know?

The previous version we used, 0.20.0-beta.10, had this feature working properly.

vCluster logging:

2024-08-22 09:36:35 INFO commandwriter/commandwriter.go:126 v1beta1.metrics.k8s.io failed with: failing or missing response from https://localhost:9001/apis/metrics.k8s.io/v1beta1: Get "https://localhost:9001/apis/metrics.k8s.io/v1beta1": context deadline exceeded {"component": "vcluster", "component": "k3s", "location": "available_controller.go:460"}

Host cluster Kubernetes version

```console $ kubectl version Client Version: v1.30.2 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.28.7+k3s1 ```

vcluster version

```console $ vcluster --version vcluster version 0.20.0 ```

VCluster Config

``` sync: fromHost: ingressClasses: enabled: true storageClasses: enabled: true toHost: ingresses: enabled: true persistentVolumes: enabled: true # Configure vCluster's control plane components and deployment. controlPlane: distro: k3s: enabled: true backingStore: etcd: deploy: enabled: true statefulSet: highAvailability: replicas: 3 scheduling: affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 50 preference: matchExpressions: - key: node-role.kubernetes.io/control-plane operator: Exists tolerations: - key: node-role.kubernetes.io/control-plane operator: Exists effect: NoSchedule persistence: volumeClaim: storageClass: "local-path" coredns: deployment: replicas: 2 ingress: spec: ingressClassName: nginx statefulSet: highAvailability: replicas: 2 scheduling: affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 50 preference: matchExpressions: - key: node-role.kubernetes.io/control-plane operator: Exists tolerations: - key: node-role.kubernetes.io/control-plane operator: Exists effect: NoSchedule policies: resourceQuota: enabled: true quota: requests.cpu: "100" requests.memory: 520Gi requests.storage: 1000Gi requests.ephemeral-storage: 300Gi requests.nvidia.com/gpu: "8" limits.cpu: "100" limits.memory: 520Gi limits.ephemeral-storage: 300Gi services.loadbalancers: "0" services.nodeports: "100" count/endpoints: "200" count/pods: "100" count/services: "100" count/secrets: "500" count/configmaps: "500" count/persistentvolumeclaims: "100" limitRange: enabled: true default: cpu: "1" memory: "2Gi" ephemeral-storage: null defaultRequest: cpu: "1" memory: "2Gi" ephemeral-storage: null networkPolicy: enabled: true podSecurityStandard: baseline integrations: metricsServer: enabled: true ```
FabianKramm commented 3 weeks ago

@wutz thanks for creating this issue! Is metrics server installed on the host cluster? What service is the metrics api reachable there?

wutz commented 3 weeks ago
  1. yes,metrics server installed on the host cluster
  2. Using the default metrics-server in the host cluster with k3s, I believe it ultimately provides the metrics API through the host cluster's k3s apiserver.
wutz commented 3 weeks ago

I remove vCluster created networkpolicies vc-cp-xxx, now kubectl top node is working normally.

wutz commented 3 weeks ago

I edit networkpolicies vc-cp-xxx to add allow access port 10250

spec:
  egress:
  - ports:
    - port: 443
      protocol: TCP
    - port: 8443
      protocol: TCP
    - port: 6443
      protocol: TCP
    - port: 10250
      protocol: TCP

kubectl top node can also works in vcluster

FabianKramm commented 3 weeks ago

@wutz can you check the output of kubectl get apiservice v1beta1.metrics.k8s.io -o yaml where that points to?

FabianKramm commented 3 weeks ago

We changed that we expect the metrics service to be at kube-system/metrics-server, if the service is somewhere else you can also do:

integrations:
  metricsServer:
    enabled: true
    apiService:
      service:
        name: my-metrics-service-name
        namespace: my-metrics-service-namespace
        port: 443
wutz commented 3 weeks ago
$ kubectl get apiservices.apiregistration.k8s.io v1beta1.metrics.k8s.io -o yaml
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  annotations:
    objectset.rio.cattle.io/applied: H4sIAAAAAAAA/4SSQY/aMBCF/0o1Z5NiQlKw1APHqq2ExIr72JnAbIgd2U5WCPHfV4bArlZiOU7evJdvnnwC7HhLPrCzoNLgaccheozsbNYsQsbu5yBBQMO2AgWr9Z8N+YENgYCWIlYYEdQJ0FoXL7aQRqdfycRAMfPsMoMxHihlcQoB8VB3b5b8ZDc0oKDJwydlkOLHX7bV71VVOfs0wmJLoBKiZxMm2HG4cz93hg5Nsje9pkk4hkgtnAUcUNPh2/v2GPagoJjrgvRC53VdlLnBXC7nsvhVynK2NEuja6SaZlim0JF0kJoiymwkHstPC6Ejk/65867vPm66bYirsPbsPMfjf7bc9i0oOZ0KYBvI9J42DXcv/zZb8lwfQUXfk4BbI+oEX+pKCnm4wj0oY7i/mxEd7t9uLBeI8/k9AAD//wQqzKFnAgAA
    objectset.rio.cattle.io/id: ""
    objectset.rio.cattle.io/owner-gvk: k3s.cattle.io/v1, Kind=Addon
    objectset.rio.cattle.io/owner-name: metrics-apiservice
    objectset.rio.cattle.io/owner-namespace: kube-system
  creationTimestamp: "2024-04-17T08:02:00Z"
  labels:
    objectset.rio.cattle.io/hash: 54b5eb8b3ff563ca319415761629c9cbfaefe2a6
  name: v1beta1.metrics.k8s.io
  resourceVersion: "141950824"
  uid: d470de95-8fff-4642-a562-e5fe9ddbd7cf
spec:
  group: metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: metrics-server
    namespace: kube-system
    port: 443
  version: v1beta1
  versionPriority: 100
status:
  conditions:
  - lastTransitionTime: "2024-08-22T19:47:07Z"
    message: all checks passed
    reason: Passed
    status: "True"
    type: Available
CiraciNicolo commented 1 week ago

I'm experiencing the same issue, nothing has changed in my cluster since the update but it seems vCluster cannot reach MetricsServer. No nwp are applied, so I don't think the issue is related to nwp.

facchettos commented 5 days ago

@CiraciNicolo , could you provide some more details? For example your config, the output of the commands above, and anything that could have some relevance?