jayeshs999 / sapg

Code for SAPG: Split and Aggregate Policy Gradients (ICML 2024)
MIT License
41 stars 2 forks source link

Policy networks are independent and central value network is fully shared without ID-based conditioning? #3

Open Naoki04 opened 5 days ago

Naoki04 commented 5 days ago

Hi, thank you for sharing your great work! I have a question about the implementation of policy and value networks.

In the paper, both the policy and value networks are described as having shared backborn models across all policies and are conditioned on the policy ID by adding the ID to the input. However, when I look into rl_games/rl_games/algos_torch/models.py, the policy networks seems like implemented as independent networks, and the value network appears to be fully shared without any conditioning based on the policy ID. This seems to differ from the approach described in the paper.

I would appreciate clarification on whether this change was intentional or if there is another location in the code where the ID-based conditioning is implemented.

Naoki

Naoki04 commented 3 days ago

Additionaly, it seems like only one ModelA2CContinuousLogStd model is used for both PPO and SAPG? not ModelMultiA2CContinuousLogStd. Could you tell me how SAPG algorithm is implemented?