p-christ / Deep-Reinforcement-Learning-Algorithms-with-PyTorch

PyTorch implementations of deep reinforcement learning algorithms and environments
MIT License
5.59k stars 1.19k forks source link

For SAC-discrete version, is it possible to update model with input of state and action just like Sac-continuous version? #62

Open dbsxdbsx opened 3 years ago

dbsxdbsx commented 3 years ago

Currently, I am trying to merge models for SAC discrete and continous version into just 1 model. According to SAC discrete critic_model, it only need input state and output action distribution. To make it as consistent with continous one, I modified it with input with both state and action, and only output q-value for the input q(s,a)---just like what happens in continuous version. Also, for the training part, now it is ok to use the same code, without considering action distributions when updating parameter. BUT the modified SAC for discrete actions just doesn't converge!

The code below is some of what I modified to let the discrete version to have similar behavior as that in continuous version, but as it doesn't converge, I guess whethere it is something wrong with log_prob?

_dist = self.distribution(action_dist) #torch.distributions.Categorical
actions = _dist.sample() 
# modified version
actions = actions.unsqueeze(1)
self.log_prob = torch.log(actions + (actions == 0.0).float() * 1e-8)
# original version        
# z = (action_dist == 0.0).float() * 1e-8
# self.log_prob = torch.log(action_dist + z)
# actions = actions.unsqueeze(1)# add batch dim

I wonder whether it is possible to let SAC-discrete version to update the same way as in sac-continuous? If it is possible, then it is happy to use almost the same code for both discrete and continuous version--- that is what I want.