odahu / odahu-automation

Apache License 2.0
8 stars 0 forks source link

Pass tolerations derived from all node pools (in appropriate group, e.g.: deployment, training, etc) to config #38

Closed vlad-tokarev closed 3 years ago

vlad-tokarev commented 4 years ago

Currently, we derive tolerations for our workloads only from the first node pool in the list of the appropriate node pools.

So if the second node pool for GPU deployment is configured then its taint: nvidia/gpu=present will not be used to derive the appropriate toleration for ModelDeployment workloads. Therefore there is no way to schedule workload on this node pool even using right node selector.

@karbyshevds @easokol please take a look, I think it should be fixed in 1.4 release