Open chienlinhuang1116 opened 8 years ago
Hi,
We found the reason is because of "/ipc/DiscoveredTree.lua:15: ERROR: (/home/chienh/big/twitter/torch-ipc/src/cliser.c, 318): (9, Bad file descriptor)".
And, this error only happens when the server is busy on other jobs. Do you have any idea?
Thank you, Chien-Lin
Hi,
I want to run 6 GPUs which will start 6 luajit jobs. However, the system only starts 5 GPUs sometimes. Currently, I will restart the training at this moment. Do you have any idea?
Thank you, Chien-Lin