rlberry-py / rlberry

An easy-to-use reinforcement learning library for research and education.
https://rlberry-py.github.io/rlberry
MIT License
162 stars 30 forks source link

bug in evaluations.py #441

Closed KohlerHECTOR closed 6 months ago

KohlerHECTOR commented 6 months ago

In rlberry.manager.evaluations.py we do:

# list all the fits of this experiment
exp_files = (agent_dir / Path("agent_handlers")).iterdir()
nfit = len(list(exp_files))

So after fitting an ExperimentManager with n_fits = k, if the agents are saved, the rlberry_data folder will contain k .pickle files AND k .zip files for the saved models. So we should do nfit = len(list(exp_files)) / 2 or nfit = len([1 for a_ in [str(e).split(".") for e in exp_files] if a_[-1]=="pickle"])

KohlerHECTOR commented 6 months ago

Get bug:

from rlberry.envs import gym_make
from rlberry.agents.stable_baselines import StableBaselinesAgent 
from stable_baselines3 import A2C, PPO
from rlberry.manager import ExperimentManager
env_id = "CartPole-v1"  # Id of the environment

env_ctor = gym_make  # constructor for the env
env_kwargs = dict(id=env_id)  # give the id of the env inside the kwargs

first_agent = ExperimentManager(
    StableBaselinesAgent,# Agent Class
    init_kwargs = dict(algo_cls=A2C),
    train_env=(env_ctor, env_kwargs),  # Environment as Tuple(constructor,kwargs)
    seed=42,
    fit_budget=int(1e4),
    n_fit=10,
    agent_name="Sb3-A2C",
)
second_agent = ExperimentManager(
    StableBaselinesAgent,# Agent Class
    init_kwargs = dict(algo_cls=PPO),
    train_env=(env_ctor, env_kwargs),  # Environment as Tuple(constructor,kwargs)
    seed=42,
    fit_budget=int(1e4),
    n_fit=10,
    agent_name="Sb3-PPO",
)
first_agent.fit()
second_agent.fit()
from rlberry.manager import evaluate_agents, plot_writer_data, plot_smoothed_curves, plot_synchronized_curves
plot_writer_data("rlberry_data/temp", "reward)