Closed yaoanderson closed 5 years ago
another question: I see you have mentioned "-gpus 0,1" parameter in another crash issue, what is the difference between "-gpus 0,1" and "-i 1" and how can I use them ?
My case: I can hardly do any thing except for training when I train with "-i 1" due to too slow response from my Mac. But I can do other thing when I train with "-gpus 0,1", I do not know why ? It seems training just use one 0 intel gpu but not 1 amd gpu when use "-gpus 0,1".
Hi, this is known issue the thing is you have to have identical GPUs and then it works :D. The nature of the problem is that the OpenCL queue is only one and it works with multiple GPUs only when kernels for devices are compiled in exactly in the same way. Then and only then it works! 🗡
See my DreamPC setup:
And how it works:
Thanks! You may close this because it is not possible what you request on your HW.
Sorry I have confused and not understand why it is not possible what you request on your HW yet.
Hi, it is possible when you have 2+ exactly the same GPUs. HW - hardware I am mean. Thanks!
I got it thanks sowson.
For my crash issue ?
Before we Close... there is one more thing... I was not 100% right, you may use in Makefile or CMakeLists.txt switch that allows you use MULTI_GPU (that is the switch name), and than (after fix I just made) you will be able to really run MlutiGPU. But this disabling sgemm implementationn from clBLAS because those one is broken for access by many devices the same time. Instead of that I wrote my own sgemm (the trivial one, without optimization) it works and it shows that when clBLAS will be ready truth MultiGPU will be possible, now math will be correct by not optimized sgemm slow you down. Last thing is the macOS when you use -gpus 0,1 it slow down 0 GPU and you will suffer lack of responsiveness when you even use Console/Terminal. Thanks a lot you have very nice issues so far!
hi @sowson I do not want to train my network by using default 0 intel gpu because it is too slow, but sometimes training will crash when I use -i 1 amd 4GB gpu, so please help me about this issue, or can I avoid this crash via some configuration or there is any other workaround ? My target is to train fast with my amd gpu without crash, please help me thanks
I tired to address this with 2 things, first some OpenCL setup in this project and second is https://github.com/sowson/clBLAS that is MultiGPU ready for this project. I even put in place a pull request. You may update source code I just pushed and then you may compile on your macOS mention clBLAS then it should improve a stability not only for multi GPU but overall. Thanks!
ok, I will try thanks so much sowson
sudo cmake and sudo make / install in clBLAS/src success:
Then I remake my darknet project success:
But, Unfortunate, still failed: Please help to check my step and failed issue @sowson
This happened sometimes, and it is second priority issue, please help solve another loss value issue first. Thanks sowson
If no more issues here can we close this one and focus on the #16 instead? Thanks!
ok, let me focus on another ticket.
Hi @sowson , I found sometimes the program will crash when training my network, but I am not sure why, Have you any idea about my issue ?