Closed PhilippWillms closed 1 month ago
Same rooted error message occurs at a different level:
.framework("torch")
.resources(num_gpus=1) #, num_cpus_for_main_process=1 should be relevant for tune
.learners(num_learners=0, num_gpus_per_learner=1)
.env_runners(
num_env_runners=4,
num_cpus_per_env_runner=1,
batch_mode="complete_episodes",
)
.rl_module(
model_config_dict={
"post_fcnet_hiddens": [64, 64],
"post_fcnet_activation": "relu",
},
rl_module_spec=SingleAgentRLModuleSpec(
module_class=ActionMaskingTorchRLModule,
),
)
.evaluation(
evaluation_num_env_runners=1,
evaluation_interval=1,
evaluation_parallel_to_training=True,
)
File c:\Users\Philipp\anaconda3\envs\py311-raynew\Lib\site-packages\ray\rllib\algorithms\algorithm.py:956, in Algorithm.step(self) 948 # Parallel eval + training: Kick off evaluation-loop and parallel train() call. ... 106 _, batch = self._preprocess_batch(batch) 107 # Call the super's method to compute values for GAE. --> 108 return super()._compute_values(batch)
AttributeError: 'super' object has no attribute '_compute_values'
Occurs also in nightly built, downloaded at 08:30 p.m. CEST.
@simonsays1980 , @sven1977 : Happens for both the trainer API (i.e. config.build().train()
) and the API and tune
.
Also with the shipped example, running it in CLI via python action_masking_rlm.py
.
Also happens if evaluation_parallel_to_training=True
is NOT set in the config. Issue always occurs at first evaluation step.
modify youy class' _compute_values
method to compute_values
@grizzlybearg : Sounds easy to implement for my own custom RLModules, but it should also be changed in the central repo managed by ray team.
I had de same problem with action masking in new api stack with RLlib 2.35, solution proposed by @grizzlybearg is working for me too. Thanks!
modify youy class'
_compute_values
method tocompute_values
in examples/rl_modules/classes/action_masking_rlm.py
@PhilippWillms Nice catch! Thanks a ton! Fixed in the related PR - waiting for tests to pass.
What happened + What you expected to happen
As per my best knowledge, the repro script covers the version of
action_masking_rlm.py
example file which was shipped in release 2.34. However, I adjusted theresources
,learners
andenv_runners
config as per my requirement.My assumption is that the config did not properly recognize that I want to use the GPU of the "main" / local cluster.
Complete error stack trace:
Versions / Dependencies
ray==2.34 torch==2.3.1+cu118 gynmasium==0.28.1 Windows 11
Reproduction script
Issue Severity
High: It blocks me from completing my task.