Open ruspaul013 opened 1 year ago
Hi @ruspaul013 and thanks for raising this issue.
I understand the problem here, but I am not sure it is within Nomad's, or the device drivers remit to perform conditional logic such as block jobs from running if they include a particular env var, when another job specification block is not present or configured. Scheduling this job on a cluster with heterogeneous clients is likely to result in placement on a client that doesn't have GPUs available, which is part of the rationale for the device drivers.
The main question that comes to mind is why can't this env block be removed if the job should not have access to GPUs?
Hello @jrasell , thanks for your reply.
Scheduling this job on a cluster with heterogeneous clients is likely to result in placement on a client that doesn't have GPUs available, which is part of the rationale for the device drivers.
Unfortunately we don't have heterogeneous clients. All of our clients have GPUs.
The main question that comes to mind is why can't this env block be removed if the job should not have access to GPUs?
The env block can be removed, but we thought that there is a way to restrict access to some env variables.
The problem the we encountered is that nomad will reserved the GPUs only if the job have the block device, but if users use the var NVIDIA_VISIBLE_DEVICES will have access to the same GPUs that are reserved by nomad, if these 2 jobs are running at the same time.
Hi @ruspaul013, that all makes sense. I am not sure what we can exactly do, but I'll keep this issue open.
we thought that there is a way to restrict access to some env variables.
Typically this kind of jobspec policy enforcement is handled by Sentinel, in the Nomad Enterprise product.
Nomad version
Nomad v1.4.3
Operating system and Environment details
Plugin "nomad-device-nvidia" v1.0.0 Plugin "nomad-driver-podman" v0.4.1
Issue
Able to use GPUs in task if using device block in resources. But if a user doesn't specify a device block but use NVIDIA_VISIBLE_DEVICES in env block, will have access to GPUs. Is there any way to prevent this from happening?
Reproduction steps
Run a job without device block but with env NVIDIA_VISIBLE_DEVICES set.
Expected Result
User that doesn't use device block, to don't have access to GPUs.
Actual Result
User have access to GPUs.
Thank you!