Closed newtonkwan closed 2 years ago
sorry so to be clear - eval now must only run with pre_trained agents?
I'm not sure we should tie that logic together. Surely there is a use case where I want a pre-trained agent to be used in other agents training?
I think I'd prepose moving the load logic into the pre_trained agents in experiment.py
and then use the runners as you have set them.
Does the EvalRunner
have the same logging as what you had in both previous eval_x runners?
I also think - just for house keeping we should keep all the runner method calls the same.
sorry so to be clear - eval now must only run with pre_trained agents?
Yes. Below comments for my take on adding back agent1: [agent1]_pre_trained
.
I'm not sure we should tie that logic together. Surely there is a use case where I want a pre-trained agent to be used in > other agents training?
Sure. I think we can add this back in if there is a potential use case for it. I didn't think we would need pre_train agents anymore and should have asked first. Green light to separate them i.e. adding agent1: [agent1]_pretrained
back in and keeping runner: eval
.
I think I'd prepose moving the load logic into the pre_trained agents in experiment.py and then use the runners as you have set them.
Sure. Adding the loading into the experiment would be fine. It would require adding back in agent1: [agent1]_pretrained
as an agent.
Does the EvalRunner have the same logging as what you had in both previous eval_x runners?
Yes. In the paper, I used what is essentially EvalRunner
for Coin Game. I determined that EvalRunner
also gives the same logging for IPD/IMP.
I also think - just for house keeping we should keep all the runner method calls the same.
This can be done. This requires either initializing all of the relevant variables before calling any runners, then passing them into every runner, even if that runner doesn't use that variable. For example, you'd need to initialize param_reshaper
outside of the if runner = evo
conditional, then pass that into both the rl
and eval
runner, even though neither of them will use it. Totally fine with it if that is a better way of doing it. An alternative, which I think is worse, is to initialize that stuff inside the evo_runner
.
Summary
Combines evaluation into a single runner. To evaluate a model, specify the model path in the
.yaml
file and setrunner: eval
.Background
evaluation_ipd.py
and on Coin Game usingrunner_pretrained.py
. The reasonevaluation_cg.py
was not used arose from a divergence in the code.runner_pretrained.py
is more structurally similar to the current training runners found inrunner_rl.py
andrunner_evo.py
, so is preferred moving forward.Objective
EvalRunner
that works for IPD, IMP, and Coin Game in a single file calledrunner_eval.py
agent1: [agent]_pretrained
in favor of assigningagent1: [agent]
andrunner: eval
.Changes
runner_pretrained.py
torunner_eval.py
for consistencyevaluation_cg.py
andevaluation_ipd.py
agent1: [PPO_memory_pretrained, PPO_pretrained, MFOS_pretrained]
in favor ofagent1: [PPO_memory, PPO, MFOS]
andrunner: eval
TODO
runner_pretrained.py
torunner_eval.py
for consistencyrunner_eval.py
by creating a test .yaml fileevaluation_cg.py
andevaluation_ipd.py
in favor ofrunner_eval.py
agent1: [PPO_memory_pretrained, PPO_pretrained, MFOS_pretrained]
in favor ofagent1: [PPO_memory, PPO, MFOS]
andrunner: eval
num_steps = num_inner_steps
does not give the same results asevaluation_ipd.py
. Specifically, one must setnum_steps=10,000
andnum_inner_steps=100
to achieve the same results. However, we want to be able to setnum_steps=100
,num_inner_steps=100
, andtotal_timesteps=10,000
so we can see what is happening at each episode.runner_eval.py
Upcoming PR
num_seeds
from configspre_train.yaml
Example
Evaluating IPD
python -m pax.experiment +experiment/ipd=earl_v_tabular ++wandb.log=True ++num_envs=1 ++num_devices=1 ++num_steps=100 ++num_inner_steps=100 ++total_timesteps=10000 ++wandb.name="testing_delete_me" ++runner=eval
Evaluating IMP
python -m pax.experiment +experiment/mp=earl_v_tabular ++wandb.log=True ++num_envs=1 ++num_devices=1 ++num_steps=100 ++num_inner_steps=100 ++total_timesteps=10000 ++seed=0 ++wandb.name="testing_delete_me" ++runner=eval
Evaluating Coin Game
python -m pax.experiment +experiment/cg=earl_v_ppo_memory ++wandb.log=True ++num_envs=100 ++num_devices=1 ++num_steps=16 ++total_timesteps=9600 ++seed=0 ++wandb.name="testing_delete_me" ++runner=eval