Sometimes the error message occurs after 50000 iterations, sometimes it occurs around 8000 iterations. After getting the "unspecified launch failure", I tried to run the train.lua script again. Then, I get the error message: "all CUDA-capable devices are busy or unavailable". I have to restart the computer to run the script again. Does anyone have the same problem?
Hi,
When I run
th train.lua -input_h5 coco/cocotalk.h5 -input_json coco/cocotalk.json,
I got the following error message randomly:
"THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-6961/cutorch/lib/THC/generic/THCStorage.c line=147 error=4 : unspecified launch failure"
Sometimes the error message occurs after 50000 iterations, sometimes it occurs around 8000 iterations. After getting the "unspecified launch failure", I tried to run the train.lua script again. Then, I get the error message: "all CUDA-capable devices are busy or unavailable". I have to restart the computer to run the script again. Does anyone have the same problem?
Thank you!