Representation Collapse

First of all, thank you so much for making this work open source!

My question isn't directly related to your implementation but rather a question about the paper -- I didn't know where else to ask, and I figured you'd have a pretty good understanding of the paper. Having said that, feel free to mark it closed if you think it is inappropriate.

I recently read the paper and I don't understand why the network doesn't cheat by simply learning to output 0s, or in their own words, by learning "collapsed representations." Any ideas?

sthalles / PyTorch-BYOL

Representation Collapse #3