Open yanghoonkim opened 1 year ago
Thanks for the great implementation. I found (in ppo_continuous) that the advantage is computed only once right after rollout, shouldn't it be inside the ppo epoch?
Thanks for the great implementation. I found (in ppo_continuous) that the advantage is computed only once right after rollout, shouldn't it be inside the ppo epoch?