Open zhouyu5 opened 1 month ago
Hello @zhouyu5, thanks. Your case makes sense to us, we would like to implement an environment variable through which you can specify the path to device_dir
, but by default it'll be standard path which is /dev/dri/by-path/
. It will help to address your issue.
Thanks, sounds great.
Hi, developers, I found the variable
device_dir
is hard coded to/dev/dri/by-path/
(see code here), in most case, this will not be a problem, but in some case, it may not work well.Take mine as example, I set up the environment in docker container, and I start the container with the following command:
docker run --device=/dev/dri ...
, then after launch the training, I will met the problem:RuntimeError: oneCCL: ze_fd_manager.cpp:143 init_device_fds: EXCEPTION: opendir failed: could not open device directory
, since thedevice_dir
is hard coded to/dev/dri/by-path/
, but the docker container only map the/dev/dri
from host machine without map the subfolderby-path
, thus there is not such a/dev/dri/by-path/
in container, thus causing the problem.I am not sure if I explain it clearly. Could you please share some of your thoughts of the problem?