Closed joshromoff closed 4 years ago
I thought that OpenAI use imperial Fisher as well, due to sampling here: https://github.com/openai/baselines/blob/9ee399f5b20cd70ac0a871927a6cf043b478193f/baselines/acktr/acktr.py#L49
its a poor naming convention. the sampling does not imply that they're using the empirical fisher.
I took a closer look. Both implementations use the actual Fisher. Although the actions are stored in some replay buffer, for ACKTR they correspond to samples from the current policy.
I think you're right. might be best to remove / change the comment here:
Does the current implementation use the empirical Fisher?
https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/a6afcb3d24f57cdafdbddc305e5213f8444c8678/a2c_ppo_acktr/algo/a2c_acktr.py#L54
This is probably not the best idea - see this paper:
https://papers.nips.cc/paper/8669-limitations-of-the-empirical-fisher-approximation-for-natural-gradient-descent
Also, the implementation in the ACKTR paper uses the actual Fisher:
https://github.com/openai/baselines/blob/9ee399f5b20cd70ac0a871927a6cf043b478193f/baselines/acktr/acktr.py#L47