Several Issues of the Meta-SAC Implementation

twni2016 / Meta-SAC

Auto-tune the Entropy Temperature of Soft Actor-Critic via Metagradient - 7th ICML AutoML workshop 2020

MIT License

30 stars 4 forks source link

@twni2016 @yufeiwang63 Thanks for your great work! I am new to reinforcement learning and have some questions about your proposed meta-SAC.

I see that the ValueNetwork is used in sacmeta.py, while it is not used in sac.py. The author of pytorch-soft-actor-critic explains that it is of not too much difference for training stability, but do you think the difference can still be ignored for your meta-sac algorithm?
The alpha_embedding is used in meta-sac while not used in sac. Does this technique contribute much to the performance gain?
To update the critic and meta_critic networks, I noticed that the two Q-function losses are separately used in your sac and meta_sac, while in the original implementation of SAC, the two Q-function losses are added together to perform one gradient update. Does this modification make any difference?

Thanks again for your work! Looking forward to your reply.

Hi, thanks for your interest in our work. As per your questions: 1) Value network: we do not use the value network in the final version of meta-sac. It is only used in one of the ablation studies, where we use the classic Q-value in the meta-objective. In terms of the code usage, the value network will only be used when the 'meta-Q' flag is set to be true. Please check our paper for more details on the meta-Q ablation. 2) alpha_embedding: we do not actually use alpha_embedding in the final version of meta-sac. It is a legacy of early experiments. If I recall correctly from the early experiments, adding it does not help improve the performance, and that's why we finally discarded it. 3) two Q losses. It is really just a matter of implementation. In the original implementation of SAC, their critic has two inner Q networks, while in our code, we explicitly maitain two critic, each has one Q network. That's why the loss are used in a different way, but they are essentially the same update.

twni2016 / Meta-SAC

Several Issues of the Meta-SAC Implementation #2