Closed szh-bash closed 6 months ago
Incorrect handling parameters of arcface_resnet.pth ( 'module.xx.xxx') with the code I added a few days ago.
self.facenet = nn.DataParallel(self.facenet) # modified to module.weight (loss/identity.py)
delete this line and deal with 'module' in arcface_resnet.pth correctlly, everything works again now.
@theEricMa theEricMa My server environment can run single-gpu training, but encounters the following issues when executing multi-gpu training tasks. After changing nproc_per_node from 1 to 4, this error occurred.
Full log here err.log What could be the possible reasons?