DeepSpeed does not want to run on our GPU machine since the fused_adam op cannot be compiled, neither in JIT nor in pre-compiled mode.
I tried various versions of deepspeed and various versions of PyTorch. The only variable I can think of at this point is the cuda/nvvm version that is installed on our machine.
Since we can currently train on an A100 GPU without needing deepspeed, we put this issue on hold.
DeepSpeed does not want to run on our GPU machine since the
fused_adam
op cannot be compiled, neither in JIT nor in pre-compiled mode. I tried various versions of deepspeed and various versions of PyTorch. The only variable I can think of at this point is the cuda/nvvm version that is installed on our machine.Since we can currently train on an A100 GPU without needing deepspeed, we put this issue on hold.