pop-os / nvidia-graphics-drivers

Pop!_OS NVIDIA Graphics Drivers
134 stars 7 forks source link

Drivers occasionally stops working #154

Open Rexhaif opened 2 years ago

Rexhaif commented 2 years ago

Hi, I ran 3 machines with Pop-OS and Nvidia GPUs:

  1. Threadripper 3970x + 2x 3090
  2. i9-10850K CPU + 3080 & 2080ti
  3. i9-10900 + 2080ti

Those are mainly used as docker hosts for some AI experiments in nvidia/cuda-based containers.

Recently all these machines started repeatedly loosing their ability to work with GPUs: nvidia-smi reports nvml driver/library version mismatch, PyTorch doesn't work with cuda placements, etc. I've fixed these problems by removing and reinstalling Nvidia drivers. But problem comes back 4-5 days after the fix. And this behaviour is observed on all 3 machines. And just to clarify - I do not run any updates, nor install any new packages on host OS - It looks like drivers just stops working without any reason.

Does anyone has experienced similar problems?