Closed alchemz closed 4 years ago
After setting num_workers=0, this issue of hanging after epoch 1 gets resolved. But there is still no plot showing for val_loss until several epochs later.
@harryhan618 Could you help explain why would you put tensorboard.scalar_summary("val_loss", val_loss, iter_idx)
outside of the for loop for line 184 in train.py?
@alchemz hi! I put tensorboard logging outside of the loop during validation, because during validation, I just want to know the loss situation for the whole epoch. Single iteration during validation is meaningless.
Hi Harry,
I have checked all the existed issues, and found no solution for this problem. The issue is that when launching tusimple training with the train.py, after train epoch #0, val epoch #0, it will always stuck at train epoch #1. And I wonder the results you get from the readme.md is also from only 1 epoch?
The following is the tensorboard, and you can see there is no logs from val loss. Is it normal?