In dist = Normal(mu, sigma) , sigma should be a positive value, but actor_net output can be negative, so action_log_prob = dist.log_prob(action) can be nan.
Try:
import torch
a = torch.FloatTensor([1]).cuda()
b = torch.FloatTensor([-1]).cuda()
dist = Normal(a,b)
action = dist.sample()
action_log_prob = dist.log_prob(action)
print(action.cpu().numpy())
print(action_log_prob.item())
You can add an activation function before the output of actor network. Using relu or softplus function may change sigma into a positive value. Hope it helps.
In
dist = Normal(mu, sigma)
,sigma
should be a positive value, but actor_net output can be negative, soaction_log_prob = dist.log_prob(action)
can benan
.Try: