Closed zhihuilics closed 6 years ago
@zhihuilics "no checkpoint found" means you are training a new model. It seems that your dataloader get stuck at the beginning of the first epoch. Please check the size of your shared memory segment (df -h | grep shm)
@yeezhu
I also meet the same problem, see below for more details
=> no checkpoint found at 'logs/voc2007/model_best.pth.tar
Training: 0%| | 0/79 [00:00<?, ?it/s]
~$ df -h | grep shm
tmpfs 2.0G 42M 1.9G 3% /dev/shm
I don't know what the right size of shared memory segment is? Thanks.
In addition, it also stuck at:
=> RuntimeError: CUDNN_STATUS_ALLOC_FAILED
Could you please give me advice on how to proceed?
@yeezhu i have solved the problem, thank you all the same. @zhihuilics do you solve it ?
@zhiweichen12 Hello, I'm glad that you've solved your problem :P
@zhiweichen0012 Hello, how did you solve it? Can you tell me?
I tried to run the demo but it stuck at:
=> no checkpoint found at 'logs/voc2007/model_best.pth.tar'
Could you please give me advice on how to proceed?