Drivers occasionally stops working

Hi, I ran 3 machines with Pop-OS and Nvidia GPUs:

Threadripper 3970x + 2x 3090
i9-10850K CPU + 3080 & 2080ti
i9-10900 + 2080ti

Those are mainly used as docker hosts for some AI experiments in nvidia/cuda-based containers.

Recently all these machines started repeatedly loosing their ability to work with GPUs: nvidia-smi reports nvml driver/library version mismatch, PyTorch doesn't work with cuda placements, etc. I've fixed these problems by removing and reinstalling Nvidia drivers. But problem comes back 4-5 days after the fix. And this behaviour is observed on all 3 machines. And just to clarify - I do not run any updates, nor install any new packages on host OS - It looks like drivers just stops working without any reason.

Does anyone has experienced similar problems?

pop-os / nvidia-graphics-drivers

Drivers occasionally stops working #154