when I want to use Baichuan to train,I give some args and it returns me some errors like below.
[real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/home/sunmy/anaconda3/envs/gra/lib/python3.9/site-packages/torch/cuda/init.py:141: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
Unable to find hostfile, will proceed with training with local resources only.
/home/sunmy/anaconda3/envs/gra/lib/python3.9/site-packages/torch/cuda/init.py:628: UserWarning: Can't initialize NVML
warnings.warn("Can't initialize NVML")
Traceback (most recent call last):
File "/home/sunmy/anaconda3/envs/gra/bin/deepspeed", line 6, in
main()
File "/home/sunmy/anaconda3/envs/gra/lib/python3.9/site-packages/deepspeed/launcher/runner.py", line 418, in main
raise RuntimeError("Unable to proceed, no GPU resources available")
RuntimeError: Unable to proceed, no GPU resources available
when I want to use Baichuan to train,I give some args and it returns me some errors like below.
[real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) /home/sunmy/anaconda3/envs/gra/lib/python3.9/site-packages/torch/cuda/init.py:141: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.) return torch._C._cuda_getDeviceCount() > 0 Unable to find hostfile, will proceed with training with local resources only. /home/sunmy/anaconda3/envs/gra/lib/python3.9/site-packages/torch/cuda/init.py:628: UserWarning: Can't initialize NVML warnings.warn("Can't initialize NVML") Traceback (most recent call last): File "/home/sunmy/anaconda3/envs/gra/bin/deepspeed", line 6, in
main()
File "/home/sunmy/anaconda3/envs/gra/lib/python3.9/site-packages/deepspeed/launcher/runner.py", line 418, in main
raise RuntimeError("Unable to proceed, no GPU resources available")
RuntimeError: Unable to proceed, no GPU resources available