Thank you so much for your code, it helped me a lot.
I am wondering about optimiser, I think in the paper, they say they used SGD, but when I changed the optimiser to SGD in your code, I got the KL vanishing problem. It is okay if I use Adam, I don't know why this happens, I am wondering if you have insights about this, thank you so much!
Hi,
Thank you so much for your code, it helped me a lot.
I am wondering about optimiser, I think in the paper, they say they used SGD, but when I changed the optimiser to SGD in your code, I got the KL vanishing problem. It is okay if I use Adam, I don't know why this happens, I am wondering if you have insights about this, thank you so much!