Encounter "CRD exists" error while Installing multi volcano scheduler

bysph commented 10 months ago

What happened: According to https://github.com/volcano-sh/volcano/blob/master/docs/design/multi-volcano-schedulers.md, we can install multi volcano schedulers for scheduling different kind of workloads. But I encountered the following error when trying to install another Helm release named "volcano-spark" in a Kubernetes cluster that already has a "volcano" Helm release installed.

Error: rendered manifests contain a resource that already exists. Unable to continue with install: CustomResourceDefinition "jobtemplates.flow.volcano.sh" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-name" must equal "volcano-spark": current value is "volcano"; annotation validation error: key "meta.helm.sh/release-namespace" must equal "volcano-spark-system": current value is "volcano-system"

What you expected to happen: There should be some parameters to control whether the CRD installation can be disabled.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?: Additionally, it appears that certain parameters such as basic.scheduler_app_name in the installation document at https://github.com/volcano-sh/volcano/tree/master/installer might be outdated. Is it possible that this documentation lacks updates? Initially, I assumed that this parameter was intended for installing multiple Volcano instances, but I did not find it in helm chart.

Environment:

Volcano Version:
Kubernetes version (use kubectl version):
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

Monokaix commented 10 months ago

What if you just deploy two different volcano scheduler and controller deployments and check whether it can work?

bysph commented 10 months ago

What if you just deploy two different volcano scheduler and controller deployments and check whether it can work?

Deploying with YAML is ok, but I've encountered a new issue. Volcano supports multiple schedulers for managing different nodes, but it seems they cannot manage different queues, this will result in incomplete isolation between tasks of different types. For example, the "reserved" of a flink queue may impact the decision-making of the volcano scheduler only for spark. How do you solve this kind of issue?

Monokaix commented 10 months ago

What if you just deploy two different volcano scheduler and controller deployments and check whether it can work?

Deploying with YAML is ok, but I've encountered a new issue. Volcano supports multiple schedulers for managing different nodes, but it seems they cannot manage different queues, this will result in incomplete isolation between tasks of different types. For example, the "reserved" of a flink queue may impact the decision-making of the volcano scheduler only for spark. How do you solve this kind of issue?

We also have a nodeGroup plugin, and it can set node affinity on queue, this might be a way to solve it.

bysph commented 10 months ago

What if you just deploy two different volcano scheduler and controller deployments and check whether it can work?

Deploying with YAML is ok, but I've encountered a new issue. Volcano supports multiple schedulers for managing different nodes, but it seems they cannot manage different queues, this will result in incomplete isolation between tasks of different types. For example, the "reserved" of a flink queue may impact the decision-making of the volcano scheduler only for spark. How do you solve this kind of issue?

We also have a nodeGroup plugin, and it can set node affinity on queue, this might be a way to solve it.

Will I encounter this issue when using this feature - that is, the monitoring is no longer accurate, the queue shows resources, but ultimately, due to node affinity, the scheduling cannot be completed? And it seems unable to meet my requirement of having different volcano schedulers claim different queues.

Monokaix commented 10 months ago

What if you just deploy two different volcano scheduler and controller deployments and check whether it can work?

Deploying with YAML is ok, but I've encountered a new issue. Volcano supports multiple schedulers for managing different nodes, but it seems they cannot manage different queues, this will result in incomplete isolation between tasks of different types. For example, the "reserved" of a flink queue may impact the decision-making of the volcano scheduler only for spark. How do you solve this kind of issue?

We also have a nodeGroup plugin, and it can set node affinity on queue, this might be a way to solve it.

Will I encounter this issue when using this feature - that is, the monitoring is no longer accurate, the queue shows resources, but ultimately, due to node affinity, the scheduling cannot be completed? And it seems unable to meet my requirement of having different volcano schedulers claim different queues.

Preempt and Reclaim are both node level action, although it chooses queue first, it will traverse all nodes just belong to current scheduler, so I think other queues and nodes that not belong to current shceudler will not be chosen and recalim.

volcano-sh / volcano

Encounter "CRD exists" error while Installing multi volcano scheduler #3302