zmtomorrow / ImprovingVAERepresentationLearning

6 stars 0 forks source link

Excellent work! #1

Open xiao7199 opened 2 years ago

xiao7199 commented 2 years ago

Thanks for sharing the code and it's a very interesting work.

I have two questions regarding the training: (1) Have you tried different target functions for pixel-cnn? e.g., softmax with 255 *3 category (the one used in pixel-cnn) or l2 loss? I have tried with those two target functions in you code but I got pretty bad classification results. I just like to check if you have observed the similar behavior in your experiment.

(2) Have you tried other network structures for encoder or decoder? e.g., standard resent for encoder and mirrored resent for decoder? I don't get good classification results with standard resent.

Thanks!

zmtomorrow commented 2 years ago

Thanks for your interest!

  1. Other loss functions: we model the images using a discrete model so that the reported BPD makes sense, L2 corresponds to a continuous model so it is not applicable here. One of the problems of the softmax loss is that has a parameter 255*3 for each pixel, which is potentially difficult for optimization, whereas a mixture of logistic distribution only has 300 parameters per pixel.
  2. We use the resnet in both encoder and decoder, what do you mean by ''standard resnet''?