Hi~keng
I have some problems about SAC-discrete.
I found this version code:https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch which has not use Gumbel-softmax, and its target entropy is set as a positive value with -np.log(1.0/acition_space.size()) * 0.98 and the log_alpha will be increase to greater than 1.0 with the update step. But the sac for continuous in this version also use a negative value with -np.prod(acition_space.size()).
But in your code, you use Gumbel-softmax and set both discrete and continuous's target entropy with a negative value with -np.prod(acition_space.size()),so the log_alpha will decrease with the update step.
I really want to know how can i set the target entropy?Why target entropy in @p-christ 's code is different from you?
Hi~keng I have some problems about SAC-discrete. I found this version code:https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch which has not use Gumbel-softmax, and its target entropy is set as a positive value with -np.log(1.0/acition_space.size()) * 0.98 and the log_alpha will be increase to greater than 1.0 with the update step. But the sac for continuous in this version also use a negative value with -np.prod(acition_space.size()). But in your code, you use Gumbel-softmax and set both discrete and continuous's target entropy with a negative value with -np.prod(acition_space.size()),so the log_alpha will decrease with the update step. I really want to know how can i set the target entropy?Why target entropy in @p-christ 's code is different from you?
https://stackoverflow.com/questions/56226133/%20soft-actor-critic-with-discrete-action-space
@kengz