RLlib unbatch() function in ray/rllib/utils/spaces/space_utils.py called from ray/rllib/evaluation/sampler.py causing errors

cwfparsonson commented 1 year ago

What happened + What you expected to happen

I have a custom multi-agent environment with a nested action space dict of the form:

Dict(<class 'nmmo.io.action.Attack'>:Dict(<class 'nmmo.io.action.Style'>:Discrete(3), <class 'nmmo.io.action.Target'>:Discrete(100)), <class 'nmmo.io.action.Buy'>:Dict(<class 'nmmo.io.action.Item'>:Discrete(170)), <class 'nmmo.io.action.Comm'>:Dict(<class 'nmmo.io.action.Token'>:Discrete(170)), <class 'nmmo.io.action.Move'>:Dict(<class 'nmmo.io.action.Direction'>:Discrete(4)), <class 'nmmo.io.action.Sell'>:Dict(<class 'nmmo.io.action.Item'>:Discrete(170), <class 'nmmo.io.action.Price'>:Discrete(100)), <class 'nmmo.io.action.Use'>:Dict(<class 'nmmo.io.action.Item'>:Discrete(170)))

I then have a custom policy which returns actions of the form:

{<class 'nmmo.io.action.Move'>: {<class 'nmmo.io.action.Direction'>: <class 'nmmo.io.action.West'>}}

I am then loading this custom environment and agent into an RLlib trainer and calling trainer.train() - the actions and observations are computed fine, but the unbatch() function in ray/rllib/utils/spaces/space_utils.py seems to be unable to unbatch my actions when called from ray/rllib/evaluation/sampler.py:

RayTaskError(TypeError)                   Traceback (most recent call last)
Input In [43], in <cell line: 2>()
      1 # perform one training epoch
----> 2 results = trainer.train()
      3 print(f'Completed RLlib epoch!!!')

File /scratch/zciccwf/py36/envs/nmmo/lib/python3.9/site-paay/rllib/evaluation/sampler.pyckages/ray/tune/trainable/trainable.py:347, in Trainable.train(self)
    345     self._warmup_time = time.time() - self._start_time
    346 start = time.time()
--> 347 result = self.step()
    348 assert isinstance(result, dict), "step() needs to return a dict."
    350 # We do not modify internal state nor update this result if duplicate.

File /scratch/zciccwf/py36/envs/nmmo/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py:661, in Algorithm.step(self)
    653     (
    654         results,
    655         train_iter_ctx,
    656     ) = self._run_one_training_iteration_and_evaluation_in_parallel()
    657 # - No evaluation necessary, just run the next training iteration.
    658 # - We have to evaluate in this training iteration, but no parallelism ->
    659 #   evaluate after the training iteration is entirely done.
    660 else:
--> 661     results, train_iter_ctx = self._run_one_training_iteration()
    663 # Sequential: Train (already done above), then evaluate.
    664 if evaluate_this_iter and not self.config["evaluation_parallel_to_training"]:

File /scratch/zciccwf/py36/envs/nmmo/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py:2378, in Algorithm._run_one_training_iteration(self)
   2376         # In case of any failures, try to ignore/recover the failed woay/rllib/evaluation/sampler.pyrkers.
   2377         except Exception as e:ay/rllib/utils/spaces/space_utils.py
-> 2378             num_recreated += self.try_recover_from_step_attempt(
   2379                 error=e,
   2380                 worker_set=self.workers,
   2381                 ignore=self.config["ignore_worker_failures"],
   2382                 recreate=self.config["recreate_failed_workers"],
   2383             )
   2384     results["num_recreated_workers"] = num_recreated
   2386 return results, train_iter_ctx

File /scratch/zciccwf/py36/envs/nmmo/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py:2185, in Algorithm.try_recover_from_step_attempt(self, error, worker_set, ignore, recreate)
   2176     # Error out.
   2177     else:
   2178         logger.warning(
   2179             "Worker crashed during training or evaluation! "
   2180             "To try to continue without failed "
   (...)
   2183             "`recreate_failed_workers=True`."
   2184         )
-> 2185         raise error
   2186 # Any other exception.
   2187 else:
   2188     # Allow logs messages to propagate.
   2189     time.sleep(0.5)

File /scratch/zciccwf/py36/envs/nmmo/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py:2373, in Algorithm._run_one_training_iteration(self)
   2371 with self._timers[TRAINING_ITERATION_TIMER]:
   2372     if self.config["_disable_execution_plan_api"]:
-> 2373         results = self.training_step()
   2374     else:
   2375         results = next(self.train_exec_impl)

File /scratch/zciccwf/py36/envs/nmmo/lib/python3.9/site-packages/ray/rllib/algorithms/ppo/ppo.py:407, in PPO.training_step(self)
    403     train_batch = synchronous_parallel_sample(
    404         worker_set=self.workers, max_agent_steps=self.config["train_batch_size"]
    405     )
    406 else:
--> 407     train_batch = synchronous_parallel_sample(
    408         worker_set=self.workers, max_env_steps=self.config["train_batch_size"]
    409     )
    410 train_batch = train_batch.as_multi_agent()
    411 self._counters[NUM_AGENT_STEPS_SAMPLED] += train_batch.agent_steps()

File /scratch/zciccwf/py36/envs/nmmo/lib/python3.9/site-packages/ray/rllib/execution/rollout_ops.py:100, in synchronous_parallel_sample(worker_set, max_agent_steps, max_env_steps, concat)ay/rllib/evaluation/sampler.py
     97     sample_batches = [worker_set.local_worker().sample()]
     98 # Loop over remote workers' `sample()` method in parallel.
     99 else:
--> 100     sample_batches = ray.get(
    101         [worker.sample.remote() for worker in worker_set.remote_workers()]
    102     )
    103 # Update our counters for the stopping criterion of the while loop.
    104 for b in sample_batches:

File /scratch/zciccwf/py36/envs/nmmo/lib/python3.9/site-packages/ray/_private/client_mode_hook.py:105, in client_mode_hook.<locals>.wrapper(*args, **kwargs)etc...>] 
    103     if func.__name__ != "init" or is_client_mode_enabled_by_default:
    104         return getattr(ray, func.__name__)(*args, **kwargs)
--> 105 return func(*args, **kwargs)

File /scratch/zciccwf/py36/envs/nmmo/lib/python3.9/site-packages/ray/_private/worker.py:2275, in get(object_refs, timeout)
   2273     worker.core_worker.dump_object_store_memory_usage()
   2274 if isinstance(value, RayTaskError):
-> 2275     raise value.as_instanceof_cause()
   2276 else:
   2277     raise value

RayTaskError(TypeError): ray::RolloutWorker.sample() (pid=2519017, ip=128.40.41.23, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7fb381c2ba90>)
  File "/scratch/zciccwf/py36/envs/nmmo/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 806, in sample
    batches = [self.input_reader.next()]
  File "/scratch/zciccwf/py36/envs/nmmo/lib/python3.9/site-packages/ray/rllib/evaluation/sampler.py", line 92, in next
    batches = [self.get_data()]
  File "/scratch/zciccwf/py36/envs/nmmo/lib/python3.9/site-packages/ray/rllib/evaluation/sampler.py", line 282, in get_data
    item = next(self._env_runner)
  File "/scratch/zciccwf/py36/envs/nmmo/lib/python3.9/site-packages/ray/rllib/evaluation/sampler.py", line 719, in _env_runner
    ] = _process_policy_eval_results(
  File "/scratch/zciccwf/py36/envs/nmmo/lib/python3.9/site-packages/ray/rllib/evaluation/sampler.py", line 1281, in _process_policy_eval_results
    actions: List[EnvActionType] = unbatch(actions)
  File "/scratch/zciccwf/py36/envs/nmmo/lib/python3.9/site-packages/ray/rllib/utils/spaces/space_utils.py", line 211, in unbatch
    for batch_pos in range(len(flat_batches[0])):
TypeError: object of type 'IterableNameComparable' has no len()

Is this error being thrown because my custom environment actions have to be custom objects rather than torch or numpy arrays? Does anyone have any idea how to begin fixing this issue so that trainer.train() can handle my custom actions?

Versions / Dependencies

ray 2.0.0 python 3.9.0

Reproduction script

N/A

Issue Severity

High: It blocks me from completing my task.

stale[bot] commented 1 year ago

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

ArturNiederfahrenhorst commented 1 year ago

Hi @cwfparsonson ,

IterableNameComparable is not implemented by RLlib but part of the external library you use. space_utils.py needs an iterable that actually implements the len method. IterableNameComparable does not seem to implement that and we can't change that in RLlib. Afaics, IterableNameComparable is part of Neural MMO and you'll have to open a feature request with Joseph Suarez over at Neural MMO.

I'm closing this issue since we can't do anything here. Feel free to reopen if you disagree or for any other reason 🙂

ray-project / ray