karmada-io / karmada

Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration
https://karmada.io
Apache License 2.0
4.49k stars 890 forks source link

A deployment with a GPU request can still be deployed to a member cluster without a GPU Node #4435

Open chaunceyjiang opened 11 months ago

chaunceyjiang commented 11 months ago

What would you like to be added:

I hope that a deployment with GPU resources can only be deployed to member clusters with GPU nodes.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: device-plugin
spec:
  replicas: 2
  selector:
    matchLabels:
      app: device-plugin
  template:
    metadata:
      labels:
        app: device-plugin
    spec:
      containers:
        - name: device-plugin
          image: busybox
          command: ["/bin/sh", "-ec", "while :; do echo '.'; sleep 5 ; done"]
          resources:
            limits:
              cpu: 2500m
              memory: 100Mi
              nvidia.com/vgpu: '1'
---
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: device-plugin-pp
spec:
  resourceSelectors:
  - apiVersion: apps/v1
    kind: Deployment
    name: device-plugin
  placement :
    clusterAffinity:
      clusterNames :
        - member1
        - member2
        - member3
    replicaScheduling:
      replicaDivisionPreference: Weighted
      replicaSchedulingType: Divided
#      weightPreference:
#        dynamicWeight: AvailableReplicas

Why is this needed:

A deployment with a GPU request cannot start normally on a cluster without GPU nodes.

chaunceyjiang commented 11 months ago

https://github.com/karmada-io/karmada/issues/3318

Perhaps we can filter out clusters without GPU nodes at the preFilter stage.

chaosi-zju commented 11 months ago

I have tested and confirmed as a problem.

besides, you gave a good scenario for #3318

tedli commented 11 months ago

Hi all,

FYI,

We met the same problem.

https://github.com/karmada-io/karmada/blob/7395a8bdf5e6ef934c49358b0536edbd0f794f34/pkg/util/resource.go#L52-L73

tried to make line 68 lifted.IsScalarResourceName(rName) happy, can fix this issue.

chaunceyjiang commented 11 months ago

Hi, @tedli Can you attend the meeting tomorrow afternoon (December 19th, at 2:30 PM)? I hope to discuss the current issue with you.

tedli commented 11 months ago

Hi @chaunceyjiang , I will be there until 3:00 PM, unfortunately after 3:00, there is another meeting that I have to attend.