Open Esaada opened 6 years ago
Lower the batch_size. Also you have two GPUs, so either set CUDA_VISIBLE_DEVICES
, so only one of the GPUs is visible, or use --worker_gpu=2
.
Thanks, but i mentioned that I decreased my batch size to 1, and it can see from the error log that worker_gpu=2. But still thanks.
I missed the comment about batch_size=1, but I don't see --hparams='batch_size=1'
in your log.
I think that INFO:tensorflow:worker_gpu=2
means that TF sees two GPUs and will allocate the memory on both GPUs, but without t2t-trainer --worker_gpu=2
, T2T will compute on one GPU only.
You should also make sure there is no other process running (and taking memory) on the gpu(s).
Sent from my LG X power using FastHub
I missed the comment about batch_size=1, but I don't see
--hparams='batch_size=1'
in your log. I think thatINFO:tensorflow:worker_gpu=2
means that TF sees two GPUs and will allocate the memory on both GPUs, but withoutt2t-trainer --worker_gpu=2
, T2T will compute on one GPU only. You should also make sure there is no other process running (and taking memory) on the gpu(s).
Thanks again, I did wrote it in my t2t-trainer command, here it's showed, I'll fix it. And when I'm doing nvidia-smi, only 1% of the gpu memory is used, that's the most annoying thing! it doesn't make any sense!
@BarakBat have you fixed the issue? I am facing the same issue. specified the --haprams="batch_size=1" ---hparams_set=transfomer_base_single_gpu
Description
When trying to train on GPU , gettig this error: Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR I saw few posts about this problem, but know one got an answer, so if someone solve it, I'd be happy for his kind help.
comment: I decrease my batch size to 1.
Environment information