Closed youjin-c closed 2 years ago
What student architecture did you use?
Hello @lmxyy, Thanks for checking my question.
They are using an inception distiller
they implemented, which is not part of your code.
I think I would go back to the author of the distiller or test with your repo separately.
Thanks again for checking this, and I will get back when I have more firm questions.
Hello, I am training a CycleGAN with a repo based on this paper. I found distilling on 256x256 works fine and very well but not on 512x512. This is my thread about it. I found this repo very recently, so was thinking of investigating the paper and repo, but posting this question first to get any hints for distilling. Are there any ways to prevent collapsing with fine-tuning? or should I change the student structure?