sweetice / Deep-reinforcement-learning-with-pytorch

PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....
MIT License
3.75k stars 837 forks source link

Big bug in PPO2 #35

Open Vinson-sheep opened 2 years ago

Vinson-sheep commented 2 years ago

In dist = Normal(mu, sigma) , sigma should be a positive value, but actor_net output can be negative, so action_log_prob = dist.log_prob(action) can be nan.

Try:

import torch
a = torch.FloatTensor([1]).cuda()
b = torch.FloatTensor([-1]).cuda()
dist = Normal(a,b)
action = dist.sample()
action_log_prob = dist.log_prob(action)

print(action.cpu().numpy())
print(action_log_prob.item())
jzl20 commented 2 years ago

so how can I fix the bug ?

flyinglife001 commented 1 year ago

return sigma*sigma

WhiteNightSleepless commented 1 year ago

You can add an activation function before the output of actor network. Using relu or softplus function may change sigma into a positive value. Hope it helps.