Open viniciusdc opened 3 months ago
More details on the original thread, but the main problem was that our fix for the scale to zero issue introduced a new tagging mechanism using the dedicated
attribute in each profile. This was not documented anywhere that I could find, and our GPU docs not only were not migrated (or were removed) but also didn't follow the new schema.
A recent deployment of Nebari 2024.03.03 on an AWS with a
g4dx.xlarge
GPU profile has led to an issue where, despite the CUDA-related packages appearing correctly configured,torch.cuda.is_available()
still returnsFalse
. This indicates a failure to recognize the GPU Cuda drivers. Additionally, thenvidia-smi
command is not found, which suggests potential issues with NVIDIA driver integration or installation (handled non-implicitly by the existence ofgpu: true
in the configuration settings)Steps to resolve this issue:
Current configuration profile:
Additional details
Relevant issue #2392