Open n30111 opened 4 months ago
@n30111 Thanks for raising this issue. I could reproduce it. It is somewhere in the old stack. I can run the example without errors using the new stack:
from ray.tune.tuner import Tuner
from ray import tune, train
stopping_criteria = {"training_iteration": 2}
param_space ={
"env": "LunarLander-v2",
"env_config": {"continuous": True},
"enable_rl_module_and_learner": True,
"enable_env_runner_and_connector_v2": True,
"kl_coeff": 1.0,
"num_workers": 0,
"num_cpus": 0.5, # number of CPUs to use per trial
"num_gpus": 0, # number of GPUs to use per trial
"lambda": 0.95,
"clip_param": 0.2,
"lr": 1e-4,
"evaluation_interval":1,
"evaluation_duration":6,
"evaluation_num_env_runners":1,
}
tuner = Tuner("PPO",
tune_config=tune.TuneConfig(
metric="env_runners/episode_return_mean",
mode="max",
num_samples=1,
),
param_space=param_space,
run_config=train.RunConfig(stop=stopping_criteria),
)
result_grid = tuner.fit()
res = result_grid._experiment_analysis # pylint: disable=protected-access
print(res.trials[0].last_result["evaluation"]["env_runners"]["num_episodes"])
assert param_space["evaluation_duration"] == res.trials[0].last_result["evaluation"]["env_runners"]["num_episodes"]
Maybe this is an alternative for you?
We are dependent on the old stack.
What happened + What you expected to happen
When using
evaluation_num_env_runners > 1
, for RLLib evaluation, the results["evaluation"]["env_runners"]["num_episodes"]
is not equal to theevaluation_duration
set in the configuration.Versions / Dependencies
Ray 2.31 Python 3.11 Linux
Reproduction script
Issue Severity
Medium: It is a significant difficulty but I can work around it.