Closed mcbuehler closed 4 years ago
I'm sincerely sorry about the issue.
Ideally, this published version should be equivalent to my private one, and there is no additional training trick in my private codebase. One of the changes I made is that, I made the generator and discriminator construction more generic, which I hard-coded them in my private codebase. And, unfortunately, introduces the bug you found during the code publishing process.
After fixing this (I have pushed the fix to Github), I can load the CelebA 64x64 checkpoint, and the results look fine to me.
Could you provide:
Thank you for the fix.
I use tensorflow-gpu==1.9.0
.
Samples generated by checkpoint 'CelebA_64x64_N2M2S32':
Samples generated by trained-from-scratch 'CelebA_64x64_N2M2S32' (epoch 68):
It looks kind of okay, what is the FID score you got? For both of them, the FID score should obviously less than 100. If that's the case, probably there are some problems with the FID calculation. For the latter one, I think you just need to train it longer.
The FID calculated for the above models was way higher than 100, but I used a smaller sample size for the computation due to GPU memory constraints. I can compute it on 50K samples and report the numbers when it is done. In my perception, the images in your comment above look considerably more realistic.
but I used a smaller sample size for the computation
I can't quite get this. For FID calculation, we first extract Inception features batch-by-batch, then compute the FID score for 50K samples at once in CPU. Ideally, there shouldn't be any difference when you extract Inception features with a smaller batch size.
Note that, I get the results in my previous comment by cloning the latest code and running with my local machine (which has TF 1.9.0). I think that's just a sampling bias.
I think that's just a sampling bias.
I agree that we should get an FID in at least a similar range. However, we recalculated the FID for the provided pre-trained models on 50K images. For CelebA, we get FID 292 and for LSUN we get FID 365.
Sorry, let me clarify, there were two different points in our discussion:
I suppose that you switch to imageio
since you can't import scipy.misc.imread
. That's because scipy
removed misc
after version 1.2.0
. You may try to downgrade scipy
to 1.1.0
and try again to see (including re-compute the FID statistic for real data) if the resulting FID becomes normal. If that still does not work, you may upload your pre-calculated FID stats file (for real data) here and let me check if the values are identical to mine.
Thanks!
I recalculated the FID loading images with scipy.misc.imread
and I got again a high value (295).
Here is a link to the pre-calculated FID statistics on 50K images from the CelebA dataset. Could you please check if these values are in a similar range as your pre-computed statistics?
My apology. I have checked your statistics, and the results mismatched with mine. Furthermore, I can reproduce the issue now, there should've been some bugs in my data preprocessing code. I didn't particularly verify the correctness of preprocessing, and I accidentally used my caches to verify the correctness of the model implementation, which causes the issue was transparent to me.
I will investigate this issue and get back to you ASAP. However, it will take me some time, as I'm a little busy now.
Sorry again for all the inconvenience I caused.
Hi @mcbuehler , sorry for my mistakes causing your waste of time. The issue should have been resolved. There were two bugs introduced during the code publishing process, which causes (i) the FID statistics is wrong, and (ii) the pretrained model is damaged while loading from the checkpoint. I can reproduce (freshly run from scratch) the FID score of the pretrained CelebA 64x64 model now.
Note that you will have to re-compute the FID statistics after pulling from Github. And the latest code is not completely backward compatible. To secure the model performance, you may need to re-train the generators that were trained by yourself. Though I expect there wouldn't be too much difference in terms of performance.
Please kindly let me know if there is still any issue.
Hi @mcbuehler, Can you reproduce the FID now?
Yes, I get FID 3.5 for the provided pre-trained model (CelebA_64x64_N2M2S32). Thank you for the bugfix.
We have not been able to reproduce the results given the code in this repository. Here is what we have tried.
We loaded the provided pre-trained weights for the models “CelebA_128x128_N2M2S64” and “CelebA_64x64_N2M2S32” and ran inference. However, the generated images do not look as good as the ones in the paper and the calculate FID is several orders of magnitude higher than expected (around 300). The only modification we made to the code was to replace the scipy.misc.imread with imageio.imread.
In addition, we re-trained the 64x64 model with the configuration you provided (“configs/CelebA_64x64_N2M2S32.yaml”). We had to change the variable basic_layers on line 51 in model/discriminator.py from [2, 4, 8, 8] to [1, 4, 8, 8] to match the pre-trained weight dimensions. This experiment also yielded high FID and non-realistic images.
How did you train the provided weights? Did you use the private codebase? What might be a reason why we cannot reproduce the results?
Thank you.