Open JinshuChen opened 3 years ago
大家好。我遇到了错误说 “RuntimeError:预计在开始新的迭代之前已经完成了先前迭代中的缩减。这个错误表明您的模块具有未用于产生损失的参数。您可以通过(1)传递启用未使用的参数检测; (2) 确保所有函数输出
find_unused_parameters=True
都参与计算损失。如果您已经完成上述两个步骤,那么分布式数据并行模块无法在模块的返回值中定位输出张量功能。请在报告此问题时包括您的模块的损失函数和返回值的结构(例如列表、字典、可迭代)。torch.nn.parallel.DistributedDataParallel``forward``forward``forward
当我尝试使用 DDP 应用对比学习损失时。 那么有什么想法吗?TAT
Hi,may I ask if you have successfully used the loss?
Hi everyone. I met the error saying "RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument
find_unused_parameters=True
totorch.nn.parallel.DistributedDataParallel
; (2) making sure allforward
function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module'sforward
function. Please include the loss function and the structure of the return value offorward
of your module when reporting this issue (e.g. list, dict, iterable)." when I try to apply contrastive learning loss with DDP. So any idea? TAT