Hi, I tried your code and ran it for multiple times. My agents turn to stuck at 4 after even more than 10k iterations.
Do you have any insights what the problem could be?
Sorry for that. I haven't played with this code for a while. Hyperparameters like kl, and batchsize make a difference. You can increase batchsize, and it should help (it will make computation slower though).
Hi, I tried your code and ran it for multiple times. My agents turn to stuck at 4 after even more than 10k iterations. Do you have any insights what the problem could be?