nazaruka / gym-http-api

NSGA2-based Sonic agent + experimental code
MIT License
1 stars 1 forks source link

Behavior vectors across multiple evaluations #28

Open schrum2 opened 5 years ago

schrum2 commented 5 years ago

I recently changed the code so that Sonic learns for a bit with PPO, and then switches over to evaluate mode with no learning. The code I took from the PyTorch PPO agent had such an evaluation function, but it evaluated the agent 10 times. Doing multiple evals probably makes sense, but this would mess up the behavior characterization.

Currently, evaluation.py has a "loop" that only runs once, for 1 eval. I don't want to set this to more without first reconciling the fact that behavior vectors would not align for agents that died at different points in their evaluations.

Still, 1 eval may be enough, so this issue may not be important.