Filed to start core: OpenCL device

keivanB commented 4 years ago

I am testing the image on device with NVIDIA P100, The system has the nvidia driver installed and I get the nvidia-smi output. but the docker seems to fail to work properly with the GPU:

16:48:19:ERROR:WU02:FS02:Failed to start core: OpenCL device matching slot 2 not found, try setting 'opencl-index' manually

wandhydrant commented 4 years ago

Do you have several GPUs? maybe you will need to add custom options (which are forwarded to the FAHClient binary).

Do you get the nvidia-smi output also from inside the container ?

keivanB commented 4 years ago

yeap, I have multiple gpus, I can get the smi output inside the container as well.

15:14:48: GPUs: 4 15:14:48: GPU 0: Bus:10 Slot:0 Func:0 AMD:4 Cedar PRO [Radeon HD 5450] 15:14:48: GPU 1: Bus:11 Slot:0 Func:0 NVIDIA:3 GK110 [Tesla K40m] 15:14:48: GPU 2: Bus:65 Slot:0 Func:0 NVIDIA:3 GK110 [Tesla K40m] 15:14:48: GPU 3: Bus:66 Slot:0 Func:0 NVIDIA:3 GK110 [Tesla K40m] 15:14:48: CUDA Device 0: Platform:0 Device:0 Bus:11 Slot:0 Compute:3.5 Driver:10.2 15:14:48: CUDA Device 1: Platform:0 Device:1 Bus:65 Slot:0 Compute:3.5 Driver:10.2 15:14:48: CUDA Device 2: Platform:0 Device:2 Bus:66 Slot:0 Compute:3.5 Driver:10.2 15:14:48:OpenCL Device 0: Platform:0 Device:0 Bus:11 Slot:0 Compute:1.2 Driver:440.59 15:14:48:OpenCL Device 1: Platform:0 Device:1 Bus:65 Slot:0 Compute:1.2 Driver:440.59 15:14:48:OpenCL Device 2: Platform:0 Device:2 Bus:66 Slot:0 Compute:1.2 Driver:440.59

I am testing this image in order to scale it to Chamelon Cloud servers. We have some capacity and trying to help. The output above is from my local server that I am running the tests, but we have multiple Tesla P100 GPU nodes available, so it would be the same situation, at least two gpus per node. I really appreciate your help to this up an running, so I can scale it a little bit.

keivanB commented 4 years ago

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| +-----------------------------------------------------------------------------+

wandhydrant commented 4 years ago

Can you try to run foldingathome directly on your system, without containers? If so, does it pick up all GPUs?

Another idea would be to use several containers, and let each one only work on one GPU. So you would start the first one with parameter "--opencl-index 0", the second with "--opencl-index 1" etc.

Finally, maybe that Radeon card is disturbing it. I guess it's not useable from inside the container? That might need some other options or a customised config.xml. For testing, you can always edit (or docker cp) the "client.xml" file and adapt the "slots"; the restart the container.

Sorry, my personal experience with foldingathome is very limited! After a brief try on a home PC in the 00's, I switched to BOINC and fully-open source projects. It's only now that I came back to it, and the first thing I did was to put it in a container.

wandhydrant / folding-at-home-docker-gpu

Filed to start core: OpenCL device #2