lucidrains / mixture-of-experts

A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models
MIT License
628 stars 49 forks source link

implicit inplace operation '*=' cause an error when deriving the back gradient in pytorch #6

Closed VRCMF closed 2 years ago

VRCMF commented 3 years ago

In the code Error, it cause the failure of deriving the back gradient.

Solution: density_1_proxy = density_1_proxy*equals_one_mask[..., None]

lucidrains commented 2 years ago

@VRCMF thanks Wei! https://github.com/lucidrains/mixture-of-experts/commit/04201ee9d1add6913032b3cb75c7b59b289b16de