Closed wildsky95 closed 2 years ago
Hey @wildsky95 , this should be an easy fix. Your "policies_to_learn" should be "policies_to_train".
To explain the error: Your RandomPolicy does not do postprocessing (it does not have a postprocess_trajectory
method defined), so advantages are not calculated for its batches.
We should do the following things:
Closing this now, I could confirm your script runs well with the change from "policies_to_learn" to "policies_to_train".
@wildsky95 : https://github.com/ray-project/ray/pull/21448
Ah, also just noticed that this is RLlib's fault. The example script has this wrong, but doesn't detect it b/c it's using PG (not computing advantages) and not PPO. Fixed in above PR. Thanks again!
Search before asking
Ray Component
Ray Tune
What happened + What you expected to happen
Hi I'm trying to use PPO with tune.run in a custom multi-agent environment and i get key error: "advantages". is this a bug or how should i solve this. i tried changing the version and the error seems to appear every time. i tried PG agent it runs correctly on my custom environment but with PPO agent i get this error
2021-12-12 21:26:34,243 ERROR trial_runner.py:924 -- Trial PPO_multi_agent_d19e6_00000: Error processing event. Traceback (most recent call last): File "/home/wildsky/Dropbox/NLP_CO/marl_test/test.py", line 93, in <module> tune.run(**exp_dict, fail_fast="raise") File "/home/wildsky/My_Venv/DRL/lib/python3.8/site-packages/ray/tune/tune.py", line 607, in run runner.step() File "/home/wildsky/My_Venv/DRL/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 705, in step self._process_events(timeout=timeout) File "/home/wildsky/My_Venv/DRL/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 866, in _process_events self._process_trial(trial) File "/home/wildsky/My_Venv/DRL/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 893, in _process_trial results = self.trial_executor.fetch_result(trial) File "/home/wildsky/My_Venv/DRL/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 718, in fetch_result result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT) File "/home/wildsky/My_Venv/DRL/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper return func(*args, **kwargs) File "/home/wildsky/My_Venv/DRL/lib/python3.8/site-packages/ray/worker.py", line 1728, in get raise value.as_instanceof_cause() ray.exceptions.RayTaskError(KeyError): ray::PPOTrainer.train() (pid=129494, ip=192.168.1.103, repr=PPOTrainer) File "/home/wildsky/My_Venv/DRL/lib/python3.8/site-packages/ray/tune/trainable.py", line 315, in train result = self.step() File "/home/wildsky/My_Venv/DRL/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 929, in step raise e File "/home/wildsky/My_Venv/DRL/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 911, in step result = self.step_attempt() File "/home/wildsky/My_Venv/DRL/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 983, in step_attempt step_results = next(self.train_exec_impl) File "/home/wildsky/My_Venv/DRL/lib/python3.8/site-packages/ray/util/iter.py", line 756, in __next__ return next(self.built_iterator) File "/home/wildsky/My_Venv/DRL/lib/python3.8/site-packages/ray/util/iter.py", line 783, in apply_foreach for item in it: File "/home/wildsky/My_Venv/DRL/lib/python3.8/site-packages/ray/util/iter.py", line 783, in apply_foreach for item in it: File "/home/wildsky/My_Venv/DRL/lib/python3.8/site-packages/ray/util/iter.py", line 843, in apply_filter for item in it: File "/home/wildsky/My_Venv/DRL/lib/python3.8/site-packages/ray/util/iter.py", line 843, in apply_filter for item in it: File "/home/wildsky/My_Venv/DRL/lib/python3.8/site-packages/ray/util/iter.py", line 783, in apply_foreach for item in it: File "/home/wildsky/My_Venv/DRL/lib/python3.8/site-packages/ray/util/iter.py", line 783, in apply_foreach for item in it: File "/home/wildsky/My_Venv/DRL/lib/python3.8/site-packages/ray/util/iter.py", line 783, in apply_foreach for item in it: File "/home/wildsky/My_Venv/DRL/lib/python3.8/site-packages/ray/util/iter.py", line 791, in apply_foreach result = fn(item) File "/home/wildsky/My_Venv/DRL/lib/python3.8/site-packages/ray/rllib/execution/rollout_ops.py", line 266, in __call__ batch[field] = standardized(batch[field]) File "/home/wildsky/My_Venv/DRL/lib/python3.8/site-packages/ray/rllib/policy/sample_batch.py", line 712, in __getitem__ value = dict.__getitem__(self, key) KeyError: 'advantages'
Versions / Dependencies
version 1.8, 1.9, 2.0
Reproduction script