Open kc-ustc opened 6 months ago
Thanks for your sharing!The paper states that "The encoder can then be trained by maximizing the log-likelihood of samples (z, s, s′) collected from the policy".What is the relationship between this and the '_calc_enc_error'?
Thanks for your sharing!The paper states that "The encoder can then be trained by maximizing the log-likelihood of samples (z, s, s′) collected from the policy".What is the relationship between this and the '_calc_enc_error'?