ncsa / puppet-profile_gpu

Puppet profile for GPU configuration
0 stars 0 forks source link

SVCPLAN-2704 Use custom fact to only enable DCGM on nodes with NVIDIA GPUs #2

Closed bsper2 closed 5 months ago

bsper2 commented 1 year ago

Right now we enable DCGM install and telegraf collection by default, but ideally this should use a fact to only install this when a node has an NVIDIA GPU.

Once that fact is in place the README.md should be updated to remove comments about turning the DCGM off on non nvidia gpu nodes

billglick commented 6 months ago

A couple of possible ways to tell if an NVIDIA GPU is installed in a given host:

# SEARCH PCI DEVICES - PROBABLY BETTER FOR A CUSTOM FACT
lspci | grep -i nvidia | egrep -iqw '3D|Tesla' && echo true || echo false

# RUN nvidia-smi - RELIES ON NVIDIA SOFTWARE TO BE INSTALLED
nvidia-smi | grep -q NVIDIA && echo true || echo false
billglick commented 6 months ago

I think the fact could be written something like the following:

Facter.add(:nvidia_gpu) do
  setcode do
    `lspci | grep -i nvidia | egrep -iqw '3D|Tesla' && echo true || echo false`.strip
  end
end

Eventually it may also be useful to list the type(s) of NVIDIA GPUs installed. But that is beyond this scope.