When I use the sync bn on 8x GPU, I will stop sometimes.

vacancy / Synchronized-BatchNorm-PyTorch

Synchronized Batch Normalization implementation in PyTorch.

MIT License

1.5k stars 189 forks source link

When I use the sync bn on 8x GPU, I will stop sometimes. #11

Closed yu-changqian closed 6 years ago

yu-changqian commented 6 years ago

GPU benchmark: 8 x 1080 Ti Cuda version: 9.0 Pytorch version: 0.4.1

Experiment config: batch size: 16 num workers: 16 input size: 480x480

When I use the sync bn on the ADE20K dataset, my experiment will stop at a certain iteration without other notion output. And the utilization rate of GPU will drop to 0. Did you have the similar experience?

vacancy commented 6 years ago

Is this comment related? https://github.com/vacancy/Synchronized-BatchNorm-PyTorch/issues/3#issuecomment-412139776

vacancy commented 6 years ago

Closing the issue for now. Feel free to reopen it if you still have questions.