Closed yuchen-x closed 4 years ago
Hi,
The sparse reward setting was not considered in developing LIIR. I think the performance comes from the fact the learning of intrinsic reward relies on extrinsic reward, which is sparse in your setting. If you are interested, I think some heuristic reward shaping method like RND may be helpful for the sparse reward setting.
That ''mask_alive'' was used to keep only alive agents in calculating losses. That should not be a problem of sparse reward. It's helpful to check that variable before it is feed to the replay buffer.
Thanks!
Hi,
Thank you very much for the explanation!
mask_alive = 1.0 - avail_actions1[:, :, 0]
I guess you are using each agent's first action to check whether it is still alive or not, and this works for SMAC but not for general multi-agent envs. In my case, all agents' are always alive and actions are available all the time. I realize this operation will mask out all alive agents that causes zero gradient.
After modifying it to
mask_alive = 1.0 * avail_actions1[:, :, 0]
liir learns reasonable performance.
Thanks!
Hi,
I run liir in the Capture Target domain, where two agents have to capture a moving target simultaneously in a grid world with only +1 terminal reward. I performed decent hyper-parameter tuning, however, it doesn't learn anything.
I found the "mask_alive" (line68 liir_learner.py) made all available actions to be 0, which cause the log_pi_taken (line 99 liir_learner.py) to be 0 in the end. So there was no gradient at all. Is this a bug, or any other suggestion?
Thanks!