Closed XiaomuWang closed 1 year ago
This is a Python error message showing a KeyError: 'distance_error' error. From the code and error messages, it is possible that a dictionary key ('distance_error') was used while processing traffic simulation data, but the key did not exist or was removed, causing the program to terminate unexpectedly. You need to further examine your code logic to find and fix the problem.
Thank you for pointing this out!
Thank you so much for your work, When I run
lambda w: w.sample(), local_worker=False, healthy_only=True
File "/opt/conda/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 900, in sample
batches = [self.input_reader.next()]
File "/opt/conda/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 92, in next
batches = [self.get_data()]
File "/opt/conda/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 285, in get_data
item = next(self._env_runner)
File "/opt/conda/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 685, in _env_runner
sample_collector=sample_collector,
File "/opt/conda/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 1012, in _process_observations
env_index=env_id,
File "/workspace/wangs/scenarios_gen/trafficgen/trafficgen/utils/training_utils.py", line 476, in on_episode_step
episode.user_data["distance_error"][k].append(info["distance_error"])
KeyError: 'distance_error'
pythonrun_rl_training.py --exp-name rl_test1 --num-gpus 3 --dataset_train /workspace/datasets/generated_1385_training/1385_training/ --dataset_test / workspace/datasets/generated_1385_training/validation
, the following error: Failure # 1 (occurred at 2023-05-24_02-06-55) [36mray::PPO.train()[39m (pid=18313, ip=172.17.0.15, repr=PPO) File "/opt/conda/lib/python3.7/site-packages/ray/tune/trainable/trainable.py", line 367, in train raise skipped from exception_cause(skipped) File "/opt/conda/lib/python3.7/site-packages/ray/tune/trainable/trainable.py", line 364, in train result = self.step() File "/opt/conda/lib/python3.7/site-packages/ray/rllib/algorithms/algorithm.py", line 749, in step results, train_iter_ctx = self._run_one_training_iteration() File "/opt/conda/lib/python3.7/site-packages/ray/rllib/algorithms/algorithm.py", line 2623, in _run_one_training_iteration results = self.training_step() File "/opt/conda/lib/python3.7/site-packages/ray/rllib/algorithms/ppo/ppo.py", line 319, in training_step worker_set=self.workers, max_env_steps=self.config.train_batch_size File "/opt/conda/lib/python3.7/site-packages/ray/rllib/execution/rollout_ops.py", line 86, in synchronous_parallel_sample lambda w: w.sample(), local_worker=False, healthy_only=True File "/opt/conda/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 696, in foreach_worker handle_remote_call_result_errors(remote_results, self._ignore_worker_failures) File "/opt/conda/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 73, in handle_remote_call_result_errors raise r.get() ray.exceptions.RayTaskError(KeyError): [36mray::RolloutWorker.apply()[39m (pid=19718, ip=172.17.0.15, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7fef0612f6d0>) File "/opt/conda/lib/python3.7/site-packages/ray/rllib/utils/actor_manager.py", line 183, in apply raise e File "/opt/conda/lib/python3.7/site-packages/ray/rllib/utils/actor_manager.py", line 174, in apply return func(self, *args, **kwargs) File "/opt/conda/lib/python3.7/site-packages/ray/rllib/execution/rollout_ops.py", line 86, in