ikostrikov / pytorch-trpo

PyTorch implementation of Trust Region Policy Optimization
MIT License
433 stars 91 forks source link

Is the get_kl() function correct? #13

Closed zzzxxxttt closed 6 years ago

zzzxxxttt commented 6 years ago

Thanks for your great code! I notice that in the function get_kl(), you use policy net to generate the mean, log_std and std, then copy these three parameters and calculate the KL divergence between the original parameters and the copied parameters, which is obviously zero all the time. Is this a bug or a intended behavior?