Huge thanks for implement! I have a question regarding the training time in the single GPU you mentioned.
I did the same training procedure in batch size 96 on the RTX 2080Ti GPU as you did, but it took a lot longer than the training time you mentioned (12hrs to ~10k training iterations).
I have no idea the cause of this issue at all. Could you explain your training environment precisely?
Please refer to my working environment at the bottom.
Docker environment with
CUDA 10.1
cuDNN v7
ubuntu 18.04
python 3.8
@ivanvovk I'm facing the same problem. Have tried running it on an 8x A100 server too but it has taken me 9 days to reach 200 iterations. Any pointers would be greatly appreciated.
Huge thanks for implement! I have a question regarding the training time in the single GPU you mentioned. I did the same training procedure in batch size 96 on the RTX 2080Ti GPU as you did, but it took a lot longer than the training time you mentioned (12hrs to ~10k training iterations). I have no idea the cause of this issue at all. Could you explain your training environment precisely?
Please refer to my working environment at the bottom. Docker environment with CUDA 10.1 cuDNN v7 ubuntu 18.04 python 3.8