some questions about the model architecture

Thanks for your wonderful work! I am interested in the model architecture: two branches, optimize post and prior distributions with KL_divergence and use some other loss to optimize decoder, I try to use the architecture on some other works. but when I train the model, KL_divergence can‘t be optimized well, always meet error: NaN or Inf found in input tensor, did you ever meet the error？ could you share some experience about how to optimize KL_divergence? Thank you very much and sorry for the inconvenience.

paul007pl / VRCNet

some questions about the model architecture #28