hughperkins / cltorch

An OpenCL backend for torch.
Other
289 stars 26 forks source link

I have multiple GPU, why test-device.lua only see 1 #74

Closed mw66 closed 7 years ago

mw66 commented 8 years ago

$ lspci

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii PRO [Radeon R9 290] 02:00.0 VGA compatible controller: NVIDIA Corporation GM206 [GeForce GTX 960](rev a1)

$ luajit -l cltorch ./src/test/test-device.lua running require cltorch... ... require cltorch done num devices: 1 device properties, device 1 deviceType GPU localMemSizeKB 48 globalMemSizeMB 4095 deviceVersion OpenCL 1.2 CUDA platformVendor NVIDIA Corporation deviceName GeForce GTX 960 maxComputeUnits 8 globalMemCachelineSizeKB 0 openClCVersion OpenCL C 1.2 maxClockFrequency 1367 maxMemAllocSizeMB 1023 maxWorkGroupSize 1024 Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA Using OpenCL device: GeForce GTX 960 c1 7 -4 5 [torch.ClTensor of size 3]

7 4 5 [torch.ClTensor of size 3]

Thanks.

hughperkins commented 8 years ago

Hi Mingwu,

Normally it will probably be because the OpenCL drivers for that GPU have not been installed. Can you provide the output of clinfo please? (You might need to sudo apt-get install clinfo first)

mw66 commented 8 years ago

Indeed I need to run with 'sudo su', and then it can see 2 GPUs.

luajit -l cltorch ./src/test/test-device.lua

running require cltorch... ... require cltorch done num devices: 2 device properties, device 1 deviceType GPU localMemSizeKB 48 globalMemSizeMB 4095 deviceVersion OpenCL 1.2 CUDA platformVendor NVIDIA Corporation deviceName GeForce GTX 960 maxComputeUnits 8 globalMemCachelineSizeKB 0 openClCVersion OpenCL C 1.2 maxClockFrequency 1367 maxMemAllocSizeMB 1023 maxWorkGroupSize 1024 device properties, device 2 deviceType GPU localMemSizeKB 32 globalMemSizeMB 4052 deviceVersion OpenCL 2.0 AMD-APP (1598.5) platformVendor Advanced Micro Devices, Inc. deviceName Hawaii maxComputeUnits 40 globalMemCachelineSizeKB 0 openClCVersion OpenCL C 2.0 maxClockFrequency 947 maxMemAllocSizeMB 2867 maxWorkGroupSize 256 Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA Using OpenCL device: GeForce GTX 960 c1 7 -4 5 [torch.ClTensor of size 3]

7 4 5 [torch.ClTensor of size 3]

Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing Using OpenCL device: Hawaii c1 7 -4 5 [torch.ClTensor of size 3]

7 4 5 [torch.ClTensor of size 3]

Now my question is why I can only run OpenCL on R290 as 'su'? but not as ordinary user?

Is there any way I can fix this?

Thanks.

hughperkins commented 8 years ago

BTW, I'm 'ssh -X' into that machine, and running with 'sudo'.

Does it matter?

Yes, probably. GPU drivers in general tend to prefer one uses them from a desktop enviornment, which is using that GPU. As far as I know, AMD GPUs are no less picky in this respect. Unfortunately I dont have am AMD GPU to test with, and prevoius threads on the amd list, eg https://github.com/amd/OpenCL-caffe/issues/13 never seemed to get resolved. I'm not really sure how to solve this to be honest.

For NVIDIA, on linux, it tends to be sufficient to just run with sudo.

For NVIDIA, on Windows, I find I have to use vnc to connect to the desktop (eg rdesktop doesnt work correctly, inserts some other driver into the video stack somehow).

Maybe you can try vnc perhaps??? The best thing would be to check with some support guy, but, there's a lot of AMD guys in that thread I just linked to, so ... ????

mw66 commented 8 years ago

Guess I have to use 'sudo su'.

The next question is how do I tell cltorch which GPU device to run?

hughperkins commented 8 years ago

Guess I have to use 'sudo su'.

Cool. If that works, thats excellent info :-)

The next question is how do I tell cltorch which GPU device to run?

In theory it hsould be like:

cltorch.setDevice(2)

... for the second gpu, or:

cltorch.setDevice(1)

... for the first one

mehditlili commented 5 years ago

Just a small comment as I also was facing this problem. First running scripts with sudo in linux is kind of unsafe. I didn't test with Nvidia or AMD graphic cards but I had the same problem with Intel HD integrated graphics. The trick was to add the user to the group "video" sudo usermod -a -G video $LOGNAME

and to close the ssh session and reconnect again. then clinfo displays your gpu with no need for sudo.

gslin commented 2 years ago

Although this issue is a little old, but I would like to add some notes. Hope this can help others.

In my case /dev/kfd is root:render instead of root:video:

crw-rw---- 1 root render 237, 0 Feb 26 15:15 /dev/kfd

So you would need to use sudo usermod -a -G render $LOGNAME to do this.