Behavior vectors across multiple evaluations

I recently changed the code so that Sonic learns for a bit with PPO, and then switches over to evaluate mode with no learning. The code I took from the PyTorch PPO agent had such an evaluation function, but it evaluated the agent 10 times. Doing multiple evals probably makes sense, but this would mess up the behavior characterization.

Currently, evaluation.py has a "loop" that only runs once, for 1 eval. I don't want to set this to more without first reconciling the fact that behavior vectors would not align for agents that died at different points in their evaluations.

Still, 1 eval may be enough, so this issue may not be important.

nazaruka / gym-http-api

Behavior vectors across multiple evaluations #28