Combine Evaluation Runners

newtonkwan commented 2 years ago

Summary

Combines evaluation into a single runner. To evaluate a model, specify the model path in the .yaml file and set runner: eval.

Background

For the paper, we evaluated our models on IPD/IMP using evaluation_ipd.py and on Coin Game using runner_pretrained.py. The reason evaluation_cg.py was not used arose from a divergence in the code. runner_pretrained.py is more structurally similar to the current training runners found in runner_rl.py and runner_evo.py, so is preferred moving forward.

Objective

Create a single evaluation runner called EvalRunner that works for IPD, IMP, and Coin Game in a single file called runner_eval.py
Deprecate assigning agent1: [agent]_pretrained in favor of assigning agent1: [agent] and runner: eval.

Changes

Rename runner_pretrained.py to runner_eval.py for consistency
Delete evaluation_cg.py and evaluation_ipd.py
Deprecate the config setting agent1: [PPO_memory_pretrained, PPO_pretrained, MFOS_pretrained] in favor of agent1: [PPO_memory, PPO, MFOS] and runner: eval

TODO

[x] Rename runner_pretrained.py to runner_eval.py for consistency
[x] Verify that IPD/IMP evaluation can be done using runner_eval.py by creating a test .yaml file
[x] Deprecate evaluation_cg.py and evaluation_ipd.py in favor of runner_eval.py
[x] Deprecate the config setting agent1: [PPO_memory_pretrained, PPO_pretrained, MFOS_pretrained] in favor of agent1: [PPO_memory, PPO, MFOS] and runner: eval
[x] Reproduce paper evaluation results and provide an example command of how to run coin game, IPD, and IMP
[x] Fix bug in IPD/IMP where setting num_steps = num_inner_steps does not give the same results as evaluation_ipd.py. Specifically, one must set num_steps=10,000 and num_inner_steps=100 to achieve the same results. However, we want to be able to set num_steps=100, num_inner_steps=100, and total_timesteps=10,000 so we can see what is happening at each episode.
[x] Find out why the Coin Game evaluation cannot be reproduced in the new runner_eval.py
[x] Add model paths used in paper for reproducibility

Upcoming PR

Remove num_seeds from configs
Remove pre_train.yaml
Remove experiment configs that are deprecated and rename ones that are confusing

Example

Evaluating IPD python -m pax.experiment +experiment/ipd=earl_v_tabular ++wandb.log=True ++num_envs=1 ++num_devices=1 ++num_steps=100 ++num_inner_steps=100 ++total_timesteps=10000 ++wandb.name="testing_delete_me" ++runner=eval

Evaluating IMP python -m pax.experiment +experiment/mp=earl_v_tabular ++wandb.log=True ++num_envs=1 ++num_devices=1 ++num_steps=100 ++num_inner_steps=100 ++total_timesteps=10000 ++seed=0 ++wandb.name="testing_delete_me" ++runner=eval

Evaluating Coin Game python -m pax.experiment +experiment/cg=earl_v_ppo_memory ++wandb.log=True ++num_envs=100 ++num_devices=1 ++num_steps=16 ++total_timesteps=9600 ++seed=0 ++wandb.name="testing_delete_me" ++runner=eval

akbir commented 2 years ago

sorry so to be clear - eval now must only run with pre_trained agents?

I'm not sure we should tie that logic together. Surely there is a use case where I want a pre-trained agent to be used in other agents training?

akbir commented 2 years ago

I think I'd prepose moving the load logic into the pre_trained agents in experiment.py and then use the runners as you have set them.

Does the EvalRunner have the same logging as what you had in both previous eval_x runners?

akbir commented 2 years ago

I also think - just for house keeping we should keep all the runner method calls the same.

newtonkwan commented 2 years ago

sorry so to be clear - eval now must only run with pre_trained agents?

Yes. Below comments for my take on adding back agent1: [agent1]_pre_trained.

I'm not sure we should tie that logic together. Surely there is a use case where I want a pre-trained agent to be used in > other agents training?

Sure. I think we can add this back in if there is a potential use case for it. I didn't think we would need pre_train agents anymore and should have asked first. Green light to separate them i.e. adding agent1: [agent1]_pretrained back in and keeping runner: eval.

I think I'd prepose moving the load logic into the pre_trained agents in experiment.py and then use the runners as you have set them.

Sure. Adding the loading into the experiment would be fine. It would require adding back in agent1: [agent1]_pretrained as an agent.

Does the EvalRunner have the same logging as what you had in both previous eval_x runners?

Yes. In the paper, I used what is essentially EvalRunner for Coin Game. I determined that EvalRunner also gives the same logging for IPD/IMP.

I also think - just for house keeping we should keep all the runner method calls the same.

This can be done. This requires either initializing all of the relevant variables before calling any runners, then passing them into every runner, even if that runner doesn't use that variable. For example, you'd need to initialize param_reshaper outside of the if runner = evo conditional, then pass that into both the rl and eval runner, even though neither of them will use it. Totally fine with it if that is a better way of doing it. An alternative, which I think is worse, is to initialize that stuff inside the evo_runner.

ucl-dark / pax