Sampling of VPG should be over D*T

openai / spinningup

An educational resource to help anyone learn deep reinforcement learning.

https://spinningup.openai.com/

MIT License

10.18k stars 2.23k forks source link

Open HunderlineK opened 2 years ago

HunderlineK commented 2 years ago

I'm not 100% sure if this change is correct, but in the code we have:

    def compute_loss(obs, act, weights):
        logp = get_policy(obs).log_prob(act)
        return -(logp * weights).mean()

Where we average the loss over the total number of the observations.

I have also tried averaging just over D and the result doesn't improve properly.