PPO Training Algorithm and Training Result

Dear Author,

I try to implement the PPO algorithm to replace the random policy in the given example but I find that the predator or prey only learns to go along a straight line rather than take a more flexible action. I might go to some wrong stages but I do not know how to fix the error and get some similar experimental results like your uploaded paper. So I want to ask whether you can release more experiment details and show how to implement the prey and predator algorithm to train the agent together or separately.

I sincerely appreciate your help and reply if it is possible!

Thank you very much!

michaelkoelle / marl-aquarium

PPO Training Algorithm and Training Result #5