Closed Amir-Arsalan closed 8 years ago
I recommend you check what criterion.sizeAverage does in the original Torch7 code. From that you can infer why the reconstructions are identical. Note that these github issues are for technical problems, not for personal help.
@y0ast Sorry maybe I did not ask my question very well. I knew what the sizeAverage does, but what I wanted to know was why averaging the pixel-wise errors hinders learning?
You scale the reconstruction term of the objective down massively, which means the KLD term will overwhelm the objective. This leads to the network mainly optimizing the KLD and it going to zero, the reconstruction will be bad.
After doing criterion.sizeAverage = true I realized the KLD criterion gives an output of ~0 after epoch 2-3 constantly and the reconstructions are identical and do not make sense at all. I tried with very very small learning rates and also bigger learning rates and I still face the same issue. Why is that?