Closed Cogito2012 closed 4 years ago
Hi, thanks for your interest in our work. As per your questions: 1) Value network: we do not use the value network in the final version of meta-sac. It is only used in one of the ablation studies, where we use the classic Q-value in the meta-objective. In terms of the code usage, the value network will only be used when the 'meta-Q' flag is set to be true. Please check our paper for more details on the meta-Q ablation. 2) alpha_embedding: we do not actually use alpha_embedding in the final version of meta-sac. It is a legacy of early experiments. If I recall correctly from the early experiments, adding it does not help improve the performance, and that's why we finally discarded it. 3) two Q losses. It is really just a matter of implementation. In the original implementation of SAC, their critic has two inner Q networks, while in our code, we explicitly maitain two critic, each has one Q network. That's why the loss are used in a different way, but they are essentially the same update.
@twni2016 @yufeiwang63 Thanks for your great work! I am new to reinforcement learning and have some questions about your proposed meta-SAC.
sacmeta.py
, while it is not used insac.py
. The author of pytorch-soft-actor-critic explains that it is of not too much difference for training stability, but do you think the difference can still be ignored for your meta-sac algorithm?Thanks again for your work! Looking forward to your reply.