[RLlib] self_play_with_open_spiel example fails with use_lstm=True

RobinKa commented 12 months ago

What happened + What you expected to happen

Running self_play_with_open_spiel with use_lstm=True fails when a new policy gets added after exceeding the win rate threshold.

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/rllib/examples/self_play_with_open_spiel.py", line 302, in <module>
    ).fit()
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/tune/tuner.py", line 372, in fit
    return self._local_tuner.fit()
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/tune/impl/tuner_internal.py", line 579, in fit
    analysis = self._fit_internal(trainable, param_space)
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/tune/impl/tuner_internal.py", line 699, in _fit_internal
    analysis = run(
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/tune/tune.py", line 1103, in run
    runner.step()
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/tune/execution/tune_controller.py", line 850, in step
    if not self._actor_manager.next(timeout=0.1):
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/air/execution/_internal/actor_manager.py", line 224, in next
    self._actor_task_events.resolve_future(future)
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/air/execution/_internal/event_manager.py", line 113, in resolve_future
    on_error(e)
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/air/execution/_internal/actor_manager.py", line 770, in on_error
    self._actor_task_failed(
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/air/execution/_internal/actor_manager.py", line 289, in _actor_task_failed
    tracked_actor_task._on_error(tracked_actor, exception)
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/tune/execution/tune_controller.py", line 1423, in _on_error
    raise e
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/tune/execution/tune_controller.py", line 1416, in _on_error
    on_error(trial, exception)
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/tune/execution/tune_controller.py", line 1499, in _trial_task_failure
    raise exception
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
    result = ray.get(future)
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/_private/worker.py", line 2547, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::PPO.train() (pid=17444, ip=172.28.145.109, actor_id=660f7a8f7655c4f5a531564a01000000, repr=PPO)
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 400, in train
    raise skipped from exception_cause(skipped)
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 397, in train
    result = self.step()
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 853, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2838, in _run_one_training_iteration
    results = self.training_step()
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo.py", line 429, in training_step
    train_batch = synchronous_parallel_sample(
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/rllib/execution/rollout_ops.py", line 85, in synchronous_parallel_sample
    sample_batches = worker_set.foreach_worker(
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py", line 680, in foreach_worker
    handle_remote_call_result_errors(remote_results, self._ignore_worker_failures)
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py", line 76, in handle_remote_call_result_errors
    raise r.get()
ray.exceptions.RayTaskError(ValueError): ray::RolloutWorker.apply() (pid=17543, ip=172.28.145.109, actor_id=41c5dfbd5e1a5bac8df83a7901000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f663317dfc0>)
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/tree/__init__.py", line 435, in map_structure
    [func(*args) for args in zip(*map(flatten, structures))])
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/tree/__init__.py", line 435, in <listcomp>
    [func(*args) for args in zip(*map(flatten, structures))])
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/rllib/policy/sample_batch.py", line 1747, in _concat_values
    return np.concatenate(values, axis=1 if time_major else 0)
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 16 and the array at index 1 has size 20

During handling of the above exception, another exception occurred:

ray::RolloutWorker.apply() (pid=17543, ip=172.28.145.109, actor_id=41c5dfbd5e1a5bac8df83a7901000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f663317dfc0>)
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/rllib/utils/actor_manager.py", line 185, in apply
    raise e
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/rllib/utils/actor_manager.py", line 176, in apply
    return func(self, *args, **kwargs)
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/rllib/execution/rollout_ops.py", line 86, in <lambda>
    lambda w: w.sample(), local_worker=False, healthy_only=True
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py", line 723, in sample
    batch = concat_samples(batches)
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/rllib/policy/sample_batch.py", line 1582, in concat_samples
    return concat_samples_into_ma_batch(samples)
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/rllib/policy/sample_batch.py", line 1731, in concat_samples_into_ma_batch
    out[key] = concat_samples(batches)
  File "/home/tora/.cache/pypoetry/virtualenvs/warlock-rl-m4PJjqcF-py3.10/lib/python3.10/site-packages/ray/rllib/policy/sample_batch.py", line 1652, in concat_samples
    raise ValueError(
ValueError: Cannot concat data under key 'obs', b/c sub-structures under that key don't match. `samples`=[SampleBatch(202 (seqs=19): ['obs', 'new_obs', 'actions', 'prev_actions', 'rewards', 'prev_rewards', 'terminateds', 'truncateds', 'infos', 'eps_id', 'unroll_id', 'agent_index', 't', 'state_in', 'state_out', 'vf_preds', 'action_dist_inputs', 'action_prob', 'action_logp', 'values_bootstrapped', 'advantages', 'value_targets']), SampleBatch(202 (seqs=19): ['obs', 'new_obs', 'actions', 'prev_actions', 'rewards', 'prev_rewards', 'terminateds', 'truncateds', 'infos', 'eps_id', 'unroll_id', 'agent_index', 't', 'state_in', 'state_out', 'vf_preds', 'action_dist_inputs', 'action_prob', 'action_logp', 'values_bootstrapped', 'advantages', 'value_targets']), SampleBatch(199 (seqs=17): ['obs', 'new_obs', 'actions', 'prev_actions', 'rewards', 'prev_rewards', 'terminateds', 'truncateds', 'infos', 'eps_id', 'unroll_id', 'agent_index', 't', 'state_in', 'state_out', 'vf_preds', 'action_dist_inputs', 'action_prob', 'action_logp', 'values_bootstrapped', 'advantages', 'value_targets']), SampleBatch(200 (seqs=18): ['obs', 'new_obs', 'actions', 'prev_actions', 'rewards', 'prev_rewards', 'terminateds', 'truncateds', 'infos', 'eps_id', 'unroll_id', 'agent_index', 't', 'state_in', 'state_out', 'vf_preds', 'action_dist_inputs', 'action_prob', 'action_logp', 'values_bootstrapped', 'advantages', 'value_targets']), SampleBatch(198 (seqs=18): ['obs', 'new_obs', 'actions', 'prev_actions', 'rewards', 'prev_rewards', 'terminateds', 'truncateds', 'infos', 'eps_id', 'unroll_id', 'agent_index', 't', 'state_in', 'state_out', 'vf_preds', 'action_dist_inputs', 'action_prob', 'action_logp', 'values_bootstrapped', 'advantages', 'value_targets'])]
 Original error: 
 all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 16 and the array at index 1 has size 20

Versions / Dependencies

Ray 2.7.1 Python 3.10.6

Reproduction script

Add "use_lstm"=True to the model config of self_play_with_open_spiel, run it and wait for the win rate threshold to be exceeded so a new policy is created (can lower the threshold for quicker reproduction).

This also happens with other environments (I noticed this on my own environment where I copied most of the code of the example where the same thing happens).

Issue Severity

High: It blocks me from completing my task.

jfurches commented 11 months ago

I encountered the same problem when running a custom environment and lstm-based model on Ray 2.8.0. I had it print out the batch shapes, and with a bit of cleanup, my error is

Failure # 1 (occurred at 2023-11-10_19-43-36)
ray::PPO.train() (pid=10060, ip=10.91.0.26, actor_id=c9854846c60d32b8fc828da401000000, repr=PPO)
  File "ray/tune/trainable/trainable.py", line 342, in train
    raise skipped from exception_cause(skipped)
  File "ray/tune/trainable/trainable.py", line 339, in train
    result = self.step()
  File "ray/rllib/algorithms/algorithm.py", line 853, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "ray/rllib/algorithms/algorithm.py", line 2854, in _run_one_training_iteration
    results = self.training_step()
  File "ray/rllib/algorithms/ppo/ppo.py", line 429, in training_step
    train_batch = synchronous_parallel_sample(
  File "ray/rllib/execution/rollout_ops.py", line 101, in synchronous_parallel_sample
    full_batch = concat_samples(all_sample_batches)
  File "ray/rllib/policy/sample_batch.py", line 1580, in concat_samples
    return concat_samples_into_ma_batch(samples)
  File "ray/rllib/policy/sample_batch.py", line 1731, in concat_samples_into_ma_batch
    out[key] = concat_samples(batches)
  File "ray/rllib/policy/sample_batch.py", line 1651, in concat_samples
    raise ValueError(
ValueError: Cannot concat data under key 'obs', b/c sub-structures under that key don't match. `samples`=[SampleBatch(150 (seqs=5): [... snip ...]), SampleBatch(150 (seqs=7): []), SampleBatch(150 (seqs=4): []), SampleBatch(150 (seqs=2): []), SampleBatch(150 (seqs=3): []), SampleBatch(150 (seqs=3): []), SampleBatch(150 (seqs=2): []), SampleBatch(150 (seqs=3): []), SampleBatch(150 (seqs=6): []), SampleBatch(150 (seqs=1): []), SampleBatch(150 (seqs=7): []), SampleBatch(150 (seqs=3): []), SampleBatch(150 (seqs=6): []), SampleBatch(150 (seqs=2): []), SampleBatch(150 (seqs=5): []), SampleBatch(150 (seqs=4): []), SampleBatch(150 (seqs=2): []), SampleBatch(150 (seqs=4): []), SampleBatch(150 (seqs=2): []), SampleBatch(150 (seqs=2): []), SampleBatch(150 (seqs=10): []), SampleBatch(150 (seqs=2): []), SampleBatch(150 (seqs=2): []), SampleBatch(150 (seqs=5): []), SampleBatch(150 (seqs=7): []), SampleBatch(150 (seqs=4): []), SampleBatch(150 (seqs=8): [])]
 Original error:
 all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 99 and the array at index 1 has size 38

Batch shapes: [(5, 99, 1430), (7, 38, 1430), (4, 63, 1430), (2, 122, 1430), (3, 69, 1430), (3, 102, 1430), (2, 104, 1430), (3, 88, 1430), (6, 64, 1430), (1, 150, 1430), (7, 44, 1430), (3, 110, 1430), (6, 36, 1430), (2, 126, 1430), (5, 70, 1430), (4, 99, 1430), (2, 96, 1430), (4, 79, 1430), (2, 123, 1430), (2, 148, 1430), (10, 21, 1430), (2, 132, 1430), (2, 117, 1430), (5, 55, 1430), (7, 51, 1430), (4, 120, 1430), (8, 49, 1430)]

The relevant piece of code is

https://github.com/ray-project/ray/blob/8af874e834f8ae9c2bbfa8a9c76d434781c2048c/rllib/policy/sample_batch.py#L1674-L1686

My guess is that it is assuming the T dimension matches across all the tensors when trying to concatenate along B. My understanding is this would work if s.zero_padded is True since they'd all have T = max_seq_len, but at least in my case this isn't true. The sample batches should then be zero-padded to all have the same T dimension..? (not necessarily max_seq_len, but perhaps the max dimension of the batches). Or do I have some configuration wrong that is causing this mismatch?

simonsays1980 commented 9 months ago

@RobinKa Thanks for posting this. I can replicate the error of @jfurches on ray==2.7.1. However, I cannot replicate on ray-nightly. WIth the nightly install I can run the example for long times without any error, even though the threshold rate has been exceeded for a long time. I guess this error has been already fixed.

Could you try the last version or the nightly one?

ray-project / ray