Thanks for your wonderful work!
I am interested in the model architecture: two branches, optimize post and prior distributions with KL_divergence and use some other loss to optimize decoder, I try to use the architecture on some other works. but when I train the model, KL_divergence can‘t be optimized well, always meet error: NaN or Inf found in input tensor, did you ever meet the error? could you share some experience about how to optimize KL_divergence?
Thank you very much and sorry for the inconvenience.
Thanks for your wonderful work! I am interested in the model architecture: two branches, optimize post and prior distributions with KL_divergence and use some other loss to optimize decoder, I try to use the architecture on some other works. but when I train the model, KL_divergence can‘t be optimized well, always meet error: NaN or Inf found in input tensor, did you ever meet the error? could you share some experience about how to optimize KL_divergence? Thank you very much and sorry for the inconvenience.