Open JigneshChowdary opened 1 year ago
Hi @JigneshChowdary
What is the error your are facing?
As I am facing the following error, when running on 4 GPUs however the code work smoothly when using single GPU:
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution
Your help would be much appreciated @xuekt98 Thanks in advance!
Hi @eslambakr I encountered the same issue when running my code on 4 GPUs. Did you manage to resolve it? Could you please share how you resolved it? RuntimeError: Unable to find a valid cuDNN algorithm to run convolution
Your help would be much appreciated @xuekt98 Thanks in advance!
maybe the hyper parameters you use is not suitable that the model is too large to run on one GPU. Just try to decrease batch size
---- Replied Message ---- | From | @.> | | Date | 10/09/2023 10:32 | | To | @.> | | Cc | @.>@.> | | Subject | Re: [xuekt98/BBDM] Multi-GPU training (Issue #14) |
Hi @eslambakr I encountered the same issue when running my code on 4 GPUs. Did you manage to resolve it? Could you please share how you resolved it? RuntimeError: Unable to find a valid cuDNN algorithm to run convolution
Your help would be much appreciated @xuekt98 Thanks in advance!
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
I want to run on 4GPUs not 1GPU,I meet the question"RuntimeError: Unable to find a valid cuDNN algorithm to run convolution",Could you please share how you resolved it?
---- Replied Message ---- | From | @.> | | Date | 10/09/2023 17:54 | | To | @.> | | Cc | @.>@.> | | Subject | Re: [xuekt98/BBDM] Multi-GPU training (Issue #14) |
maybe the hyper parameters you use is not suitable that the model is too large to run on one GPU. Just try to decrease batch size
---- Replied Message ---- | From | @.> | | Date | 10/09/2023 10:32 | | To | @.> | | Cc | @.>@.> | | Subject | Re: [xuekt98/BBDM] Multi-GPU training (Issue #14) |
Hi @eslambakr I encountered the same issue when running my code on 4 GPUs. Did you manage to resolve it? Could you please share how you resolved it? RuntimeError: Unable to find a valid cuDNN algorithm to run convolution
Your help would be much appreciated @xuekt98 Thanks in advance!
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
Hi @xiaoxiaoyuii Unfortunately, I didn't solve it, as I don't have enough time to make it. But I guess it is doable we can convert any code that run one single GPU and make it support distributed training. Sorry for that.
Hi @xiaoxiaoyuii i have solved the problem, sometimes, Vram is not enough to run the code. Therefore, to handle this issue you should decrease batch size
Hi, I want train your model on multiple gpus. But I am getting errors. Can you help me in this regard?