Open luc-leonard opened 2 years ago
Hi, I checked the code and I think you are right! Thanks a lot for the catch! I will commit the fixes.
Thanks you very much for the very quick answer and fix :D
The loss is going to 'nan' when i load the correct ckpt, do you have this problem? I trained on VAS dataset.
Hi, I checked the code and I think you are right! Thanks a lot for the catch! I will commit the fixes.
Hi, I want to ask about the parameter of lpaps. The vggishish16 model is trained by vggsound. I want to know how you get the parameter of followwing layers? Whether you directly use the pre-trained model from taming transformer
self.lin0 = NetLinLayer(self.chns[0], use_dropout=use_dropout) self.lin1 = NetLinLayer(self.chns[1], use_dropout=use_dropout) self.lin2 = NetLinLayer(self.chns[2], use_dropout=use_dropout) self.lin3 = NetLinLayer(self.chns[3], use_dropout=use_dropout) self.lin4 = NetLinLayer(self.chns[4], use_dropout=use_dropout)
You may train them by adapting https://github.com/richzhang/PerceptualSimilarity script.
You may train them by adapting https://github.com/richzhang/PerceptualSimilarity script.
Can you share the code that you use vggsound dataset to train lpaps?
Ok, I managed to look into this issue for a bit more.
Thanks to your questions I discovered that this problem is actually deeper than I originally anticipated. It seems that I completely missed that NetLinLayer
layers have trainable parameters and only relied on training VGGishish. I think because the code did not complain about loading the checkpoint, as the topic starter noticed, I just moved on.
What happens is that these layers are actually randomly inited and, luckily, the model could even train to such great quality — thanks to the GAN loss. This means, that you can just drop the perceptual loss from the model and it will train much faster and to the same performance. On the practical side, it seems that having this dorky loss you may still get a bit of a boost in quality.
Ok, I managed to look into this issue for a bit more.
Thanks to your questions I discovered that this problem is actually deeper than I originally anticipated. It seems that I completely missed that
NetLinLayer
layers have trainable parameters and only relied on training VGGishish. I think because the code did not complain about loading the checkpoint, as the topic starter noticed, I just moved on.What happens is that these layers are actually randomly inited and, luckily, the model could even train to such great quality — thanks to the GAN loss. This means, that you can just drop the perceptual loss from the model and it will train much faster and to the same performance. On the practical side, it seems that having this dorky loss you may still get a bit of a boost in quality.
Thanks for your reply. I understand it.
Today I had a chance to inspect the issue a bit more thanks to @jhyau.
It seems that @jwliu-cc was right and these fixes let codebook training diverge to nans. For this reason, I am resetting the commits mentioned in this issue to the initial well-tested state despite having this nasty bug with vggish and lpaps checkpoint loading 🙁 .
Current solution:
perceptual_weight=0.0
This means that those who want to build upon SpecVQGAN could turn off the perceptual loss by setting the weight to zero and benefit from a significant speedup during training. This, however, would yield slightly different results which, according to our ablations, are still strong.
I also added a notice about it in README for other people to see.
Hello.
the vggishish_lpaps checkpoint is used here:
Errors are ignored in the code, but neither lpaps, nor vggishish manage to load it.
The checkpoint URL is here: https://github.com/v-iashin/SpecVQGAN/blob/eee222d8351df9b6314db69185d5ce8ca55b50c8/specvqgan/util.py#L8
The vggish weights can be found under the 'model' key, but I cannot find the lpaps weights anywhere in here. Are they not required ?
Best regards,