Policy networks are independent and central value network is fully shared without ID-based conditioning?

Hi, thank you for sharing your great work! I have a question about the implementation of policy and value networks.

In the paper, both the policy and value networks are described as having shared backborn models across all policies and are conditioned on the policy ID by adding the ID to the input. However, when I look into rl_games/rl_games/algos_torch/models.py, the policy networks seems like implemented as independent networks, and the value network appears to be fully shared without any conditioning based on the policy ID. This seems to differ from the approach described in the paper.

I would appreciate clarification on whether this change was intentional or if there is another location in the code where the ID-based conditioning is implemented.

Naoki

jayeshs999 / sapg

Policy networks are independent and central value network is fully shared without ID-based conditioning? #3