Open BeautyGlow opened 5 years ago
Thanks @BeautyGlow for pointing out this. We are now running an additional experiment to make this comparison more rigorous. Because affine coupling layer is a case of our method when K=2 and h(x_1)=x_1, compare by simply replacing dynamic linear transformation with affine coupling in our code will be more convincing. We will soon update our code and results to support our conclusion.
I am so interested in your paper. It will help me a lot if you give more training details. How many GPUs did you use for training? How long did it take for one epoch? I follow the setting you give for training ImageNet 32x32, but it takes about 5 hours every epoch for one GPU 2080Ti. By the way, you claim the results are obtained in 50 epochs and your model is more 10 times efficient than Glow. However, the epochs you defined is different from ones Glow did. More specifically, in Glow, every epoch depends on n_train, default=50000. On the other hand, if I understand correctly, in your paper, one epoch for processing all images in the training set. Take ImageNet as an example, one epoch means 1.28M images are processed. How do you evaluate the efficiency of your model? Thanks a lot.