ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.04k stars 5.78k forks source link

[RLlib]when run dreamverv3, self.module.set_state(state[COMPONENT_RL_MODULE][DEFAULT_MODULE_ID]), KeyError: 'rl_module' #47527

Open Small93 opened 2 months ago

Small93 commented 2 months ago

What happened + What you expected to happen

i want to try to https://github.com/ray-project/ray/tree/master/rllib/algorithms/dreamerv3, run $ cd ray/rllib/tuned_examples/dreamerv3/ $ python atari_100k.py --env ALE/Pong-v5 . Burt something went wrong, As below:

[[36mray::DreamerV3.train()^[[39m (pid=2422480, ip=10.0.0.3, actor_id=a3b258ad3ffe57374ab8633d01000000, repr=DreamerV3) 3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 4 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 5 File "/data/xx/anaconda3/envs/py311tf/lib/python3.11/site-packages/ray/tune/trainable/trainable.py", line 331, in train 6 raise skipped from exception_cause(skipped) 7 File "/data/xx/anaconda3/envs/py311tf/lib/python3.11/site-packages/ray/tune/trainable/trainable.py", line 328, in train 8 result = self.step() 9 ^^^^^^^^^^^ 10 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 11 File "/data/xx/anaconda3/envs/py311tf/lib/python3.11/site-packages/ray/rllib/algorithms/algorithm.py", line 969, in step 12 self.env_runner_group.sync_env_runner_states( 13 File "/data/xx/anaconda3/envs/py311tf/lib/python3.11/site-packages/ray/rllib/env/env_runner_group.py", line 395, in sync_env_runner_s tates 14 self.local_env_runner.set_state( 15 File "/data/xx/anaconda3/envs/py311tf/lib/python3.11/site-packages/ray/rllib/algorithms/dreamerv3/utils/env_runner.py", line 555, in set_state 16 self.module.set_state(state[COMPONENT_RL_MODULE][DEFAULT_MODULE_ID]) 17 ~^^^^^^^^^^^^^^^^^^^^^ 18 KeyError: 'rl_module' I don't know how to deal with it, please help, thank you. .

Versions / Dependencies

python==3.11.9 ray==2.35.0 tensorflow==2.17.0 tensorflow-probability==0.24.0 tf-keras==2.17.0 torch==2.4.1

Reproduction script

I tested with the raw code available in git.

.

Issue Severity

None

jackvice commented 1 month ago

I am getting the same KeyError: 'rl_module' with the following versions: python==3.11.9 ray==2.36.0 tensorflow==2.12.0 tensorflow-probability==0.19.0

GuillermoHijano commented 1 month ago

Hi, I am getting the same error. Was it fixed?

Mark2000 commented 4 days ago

@Small93 @jackvice @GuillermoHijano It appears that adding --num-env-runners 1 (or whatever number you want) to the CLI command resolves this error.