Closed amjass12 closed 1 year ago
Good questions. 1) The actor device and critic device refer to the same thing. It's just a different variable name that refers to the same physical device in your computer. 2) I left out the KL divergence because, from my initial reading of the paper, the performance using the clipped loss was generally better.
Thank you so much for the quick response and clear explanations!
Hi, Thank you so much for this guide - it is extremely clear and easy to follow! This isn't a bug, but there are a few questions I have. The first: Why are the 'values' tensor sent to the actor device (referring to line):
values = T.tensor(values).to(self.actor.device)
The values tensor is not uniquely used by the actor parameters but by both the actor and critic as it is used for the MSE for critic and then added to actor loss for the total loss
and second: why did you not include KL divergence in your implementation, was there a specific reason?
thank you so much again!