lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.56k stars 564 forks source link

GTX 1080ti can't be used unless totally disabling Intel Integrated Graphics, OpenCL version #339

Open Feiyu-Chen-THU opened 4 years ago

Feiyu-Chen-THU commented 4 years ago

I ran ./katago/katago gtp -model katanetwork.gz in powershell and got

KataGo v1.6.1
Using TrompTaylor rules initially, unless GTP/GUI overrides this

then the process automatically stopped after a while.

Also, I ran ./katago/katago benchmark -model katanetwork.gz and got

2020-11-02 10:59:01+0800: Loading model and initializing benchmark...
2020-11-02 10:59:01+0800: nnRandSeed0 = 6011439531775415837
2020-11-02 10:59:01+0800: After dedups: nnModelFile0 = katanetwork.gz useFP16 auto useNHWC auto

then the process stopped.

This problem suddenly occurred yesterday and everything was normal before (the last use before yesterday was about 1 month ago). The same crash problem occurred for leelazero. Now the two engine can't be successfully loaded by Lizzie or sabaki. I've tried to change configs or weights as well as redownload all the files for katago/leelazero/Lizzie/sabaki but nothing works.

My OS: windows 10.0.17134.1130 My graphics cards: Intel Integrated Graphics + GTX 1050ti

I'm not sure whether this problem is related to Intel Integrated Graphics, as mentioned by you in other issues. After all, everything was normal and the GTX 1050ti was rightly used before yesterday. I don't know what happened: there was no system updating and I only installed some necessary software for my work during the latest month, which I thought wouldn't influence this. Do you have any idea about this? thank you very much!

Feiyu-Chen-THU commented 4 years ago

Now I got the problem. I disabled the Intel Integrated Graphics in Device Manager and everything turned normal. But how can I solve the problem without disabling the integrated graphics? I manually set the katago.exe to be run on my Nvidia GPU but it didn't work. Can anybody tell me how to set it to solve the problem?

Friday9i commented 4 years ago

If you use OpenCL, did you try to add "openclDeviceToUse = 0" in the config file (eg "gtp_example.cfg")? Or "openclDeviceToUse = 1"? You just have to remove the "#" character before the line. I guess Intel Integrated Graphics is device 0 and GTX is device 1, so it will probably work with 1 Note: if you use Cuda, idem, you should try to change the device to 1 if it doesn't work

Feiyu-Chen-THU commented 4 years ago

Thank you @Friday9i . I use OpenCL and I've tried to set the "openclDeviceToUse" value respectively to 0, 1, 2, 3... (Of course I removed the "#"). Unfortunately it doesn't work. Up to now the only way I found effective is to disable the integrated graphics, but I really don't want to do this, lol

Feiyu-Chen-THU commented 4 years ago

More test: When I disabled the integrated graphics and set openclDeviceToUse = 0, it was normal. Then I set openclDeviceToUse = 1, I got

KataGo v1.6.1
Using TrompTaylor rules initially, unless GTP/GUI overrides this
Uncaught exception: Requested gpuIdx/device 1 was not found, valid devices range from 0 to 0

There was an expected exception.

However, if I don't disable the integrated graphics, no matter what number I set for openclDeviceToUse (e.g. 500), I would only get

KataGo v1.6.1
Using TrompTaylor rules initially, unless GTP/GUI overrides this

then the process stopped. There was even no exception about the GPU device number.

Maybe this information is helpful.

mega-optimus commented 4 years ago

try this tool to list all OpenCL devices: https://github.com/Oblomov/clinfo

mega-optimus commented 4 years ago

You may need to configure in "nvidia control panel" image

Feiyu-Chen-THU commented 4 years ago

Thank you!@mega-optimus I've tried to change the preferred graphics processor in Nvidia Control Panel but it didn't work. Disabling the integrated graphics in Device Manager is the only way to solve the problem so far. As for the "clinfo" tool, I think it will be helpful to check OpenCL status but I don't know how to use it on Windows system :(

mega-optimus commented 4 years ago

Thank you!@mega-optimus I've tried to change the preferred graphics processor in Nvidia Control Panel but it didn't work. Disabling the integrated graphics in Device Manager is the only way to solve the problem so far. As for the "clinfo" tool, I think it will be helpful to check OpenCL status but I don't know how to use it on Windows system :(

The description says something like "clinfo -l" or "clinfo -a", in the command line.

Feiyu-Chen-THU commented 4 years ago

I haven't used the clinfo command line tool, but the description says something like "clinfo -l -a"

I don't even know where to input the command. I don't know too much about programming so I have no idea about how to utilize the files I downloaded from clinfo. I tried to run fetch-opencl-dev-win.cmd and make.cmd but I'm still confused of how to input the command.

mega-optimus commented 4 years ago

there is a windows exe download link at the bottom of the page, no need to compile it yourself.

Feiyu-Chen-THU commented 4 years ago

there is a windows exe download link at the bottom of the page, no need to compile it yourself.

@mega-optimus I've tried that but the clinfo.exe crashed right after running. It seems that OpenCL SDK is needed for this tool. But as I know (but not sure), installing OpenCL SDK is not necessary for running Go engines like katago or leelazero. Whatever, I'm not sure whether I'll spend more time on this tool. But thank you all the same.

mega-optimus commented 4 years ago

i tried this on my laptop without problem, which does not have OpenCL SDK.