Closed beniz closed 11 months ago
This PR turns some BatchNorm layers into GroupNorm so that multi-gpu doesn't require many sync calls.
This PR turns some BatchNorm layers into GroupNorm so that multi-gpu doesn't require many sync calls.