Open JeremyCJM opened 1 year ago
This trick is also used in GLIDE and other Text-to-Image Generation. Zero out self.out
can enforce the output to be a zero vector. The learning target of our model is the added Gaussian Noise, whose expected mean is also zero. Therefore, our model can be trained steadily with this trick.
Hi Mingyuan,
Why zero out the parameters of the "self.out" projection module in transformers.py?
Thanks, Jeremy