First of all, thank you so much for making this work open source!
My question isn't directly related to your implementation but rather a question about the paper -- I didn't know where else to ask, and I figured you'd have a pretty good understanding of the paper. Having said that, feel free to mark it closed if you think it is inappropriate.
I recently read the paper and I don't understand why the network doesn't cheat by simply learning to output 0s, or in their own words, by learning "collapsed representations." Any ideas?
First of all, thank you so much for making this work open source!
My question isn't directly related to your implementation but rather a question about the paper -- I didn't know where else to ask, and I figured you'd have a pretty good understanding of the paper. Having said that, feel free to mark it closed if you think it is inappropriate.
I recently read the paper and I don't understand why the network doesn't cheat by simply learning to output 0s, or in their own words, by learning "collapsed representations." Any ideas?