[RLlib] - Apex DQN unable to run on CartPole due to ReplayBuffer API change

ThomasCassimon commented 1 year ago

What happened + What you expected to happen

Recent changes in the Replay Buffer APIs have made it so Apex DQN crashes while trying to add a sample to its replaybuffer.

The reproduction script below uses TensorFlow and CartPole-v1, but I have observed the same behaviour with PyTorch and a custom environment.

When I run the reproduction script below, I get the following output:

2023-02-15 15:14:59,877 INFO worker.py:1538 -- Started a local Ray instance.
(pid=61688) 
(ApexDQN pid=61688) 2023-02-15 15:15:03,503 WARNING algorithm_config.py:488 -- Cannot create ApexDQNConfig from given `config_dict`! Property __stdout_file__ not supported.
(ApexDQN pid=61688) 2023-02-15 15:15:03,503 INFO algorithm_config.py:2503 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode.
(ApexDQN pid=61688) 2023-02-15 15:15:03,506 INFO algorithm.py:501 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(ApexDQN pid=61688) 2023-02-15 15:15:03,535 WARNING env.py:159 -- Your env reset() method appears to take 'seed' or 'return_info' arguments. Note that these are not yet supported in RLlib. Seeding will take place using 'env.seed()' and the info dict will not be returned from reset.
(pid=61777) 
(pid=61778) 
(pid=61775) 
(pid=61776) 
== Status ==
Current time: 2023-02-15 15:15:07 (running for 00:00:06.61)
Memory usage on this node: 10.6/15.3 GiB 
Using FIFO scheduling algorithm.
Resources requested: 5.0/20 CPUs, 0/1 GPUs, 0.0/5.27 GiB heap, 0.0/2.63 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/thomas/ray_results/ApexDQN_2023-02-15_15-14-57
Number of trials: 1/1 (1 RUNNING)
+---------------------------------+----------+-------------------+
| Trial name                      | status   | loc               |
|---------------------------------+----------+-------------------|
| ApexDQN_CartPole-v1_2251f_00000 | RUNNING  | 10.0.10.103:61688 |
+---------------------------------+----------+-------------------+

(ApexDQN pid=61688) 2023-02-15 15:15:07,604 ERROR worker.py:400 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::MultiAgentPrioritizedReplayBuffer.apply() (pid=61778, ip=10.0.10.103)
(ApexDQN pid=61688)   File "/home/thomas/PycharmProjects/VirtualEnvironment/lib/python3.10/site-packages/ray/rllib/algorithms/apex_dqn/apex_dqn.py", line 426, in <lambda>
(ApexDQN pid=61688)     lambda actor: actor.add_batch(batch),
(ApexDQN pid=61688)   File "/home/thomas/PycharmProjects/VirtualEnvironment/lib/python3.10/site-packages/ray/rllib/utils/deprecation.py", line 115, in _ctor
(ApexDQN pid=61688)     deprecation_warning(
(ApexDQN pid=61688)   File "/home/thomas/PycharmProjects/VirtualEnvironment/lib/python3.10/site-packages/ray/rllib/utils/deprecation.py", line 43, in deprecation_warning
(ApexDQN pid=61688)     raise DeprecationWarning(msg)
(ApexDQN pid=61688) DeprecationWarning: `add_batch` has been deprecated. Use `ReplayBuffer.add()` instead.
(ApexDQN pid=61688) 
(ApexDQN pid=61688) During handling of the above exception, another exception occurred:
(ApexDQN pid=61688) 
(ApexDQN pid=61688) ray::MultiAgentPrioritizedReplayBuffer.apply() (pid=61778, ip=10.0.10.103)
(ApexDQN pid=61688)   File "/home/thomas/PycharmProjects/VirtualEnvironment/lib/python3.10/site-packages/ray/rllib/utils/actor_manager.py", line 176, in apply
(ApexDQN pid=61688)     if self.config.recreate_failed_workers:
(ApexDQN pid=61688) AttributeError: 'MultiAgentPrioritizedReplayBuffer' object has no attribute 'config'
(ApexDQN pid=61688) 2023-02-15 15:15:07,673 ERROR worker.py:400 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::MultiAgentPrioritizedReplayBuffer.apply() (pid=61775, ip=10.0.10.103)
(ApexDQN pid=61688)   File "/home/thomas/PycharmProjects/VirtualEnvironment/lib/python3.10/site-packages/ray/rllib/algorithms/apex_dqn/apex_dqn.py", line 426, in <lambda>
(ApexDQN pid=61688)     lambda actor: actor.add_batch(batch),
(ApexDQN pid=61688)   File "/home/thomas/PycharmProjects/VirtualEnvironment/lib/python3.10/site-packages/ray/rllib/utils/deprecation.py", line 115, in _ctor
(ApexDQN pid=61688)     deprecation_warning(
(ApexDQN pid=61688)   File "/home/thomas/PycharmProjects/VirtualEnvironment/lib/python3.10/site-packages/ray/rllib/utils/deprecation.py", line 43, in deprecation_warning
(ApexDQN pid=61688)     raise DeprecationWarning(msg)
(ApexDQN pid=61688) DeprecationWarning: `add_batch` has been deprecated. Use `ReplayBuffer.add()` instead.
(ApexDQN pid=61688) 
(ApexDQN pid=61688) During handling of the above exception, another exception occurred:
(ApexDQN pid=61688) 
(ApexDQN pid=61688) ray::MultiAgentPrioritizedReplayBuffer.apply() (pid=61775, ip=10.0.10.103)
(ApexDQN pid=61688)   File "/home/thomas/PycharmProjects/VirtualEnvironment/lib/python3.10/site-packages/ray/rllib/utils/actor_manager.py", line 176, in apply
(ApexDQN pid=61688)     if self.config.recreate_failed_workers:
(ApexDQN pid=61688) AttributeError: 'MultiAgentPrioritizedReplayBuffer' object has no attribute 'config'
(ApexDQN pid=61688) 2023-02-15 15:15:07,871 ERROR worker.py:400 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::MultiAgentPrioritizedReplayBuffer.apply() (pid=61777, ip=10.0.10.103)
(ApexDQN pid=61688)   File "/home/thomas/PycharmProjects/VirtualEnvironment/lib/python3.10/site-packages/ray/rllib/algorithms/apex_dqn/apex_dqn.py", line 426, in <lambda>
(ApexDQN pid=61688)     lambda actor: actor.add_batch(batch),
(ApexDQN pid=61688)   File "/home/thomas/PycharmProjects/VirtualEnvironment/lib/python3.10/site-packages/ray/rllib/utils/deprecation.py", line 115, in _ctor
(ApexDQN pid=61688)     deprecation_warning(
(ApexDQN pid=61688)   File "/home/thomas/PycharmProjects/VirtualEnvironment/lib/python3.10/site-packages/ray/rllib/utils/deprecation.py", line 43, in deprecation_warning
(ApexDQN pid=61688)     raise DeprecationWarning(msg)
(ApexDQN pid=61688) DeprecationWarning: `add_batch` has been deprecated. Use `ReplayBuffer.add()` instead.
(ApexDQN pid=61688) 
(ApexDQN pid=61688) During handling of the above exception, another exception occurred:
(ApexDQN pid=61688) 
(ApexDQN pid=61688) ray::MultiAgentPrioritizedReplayBuffer.apply() (pid=61777, ip=10.0.10.103)
(ApexDQN pid=61688)   File "/home/thomas/PycharmProjects/VirtualEnvironment/lib/python3.10/site-packages/ray/rllib/utils/actor_manager.py", line 176, in apply
(ApexDQN pid=61688)     if self.config.recreate_failed_workers:
(ApexDQN pid=61688) AttributeError: 'MultiAgentPrioritizedReplayBuffer' object has no attribute 'config'
(ApexDQN pid=61688) 2023-02-15 15:15:07,903 ERROR worker.py:400 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::MultiAgentPrioritizedReplayBuffer.apply() (pid=61776, ip=10.0.10.103)
(ApexDQN pid=61688)   File "/home/thomas/PycharmProjects/VirtualEnvironment/lib/python3.10/site-packages/ray/rllib/algorithms/apex_dqn/apex_dqn.py", line 426, in <lambda>
(ApexDQN pid=61688)     lambda actor: actor.add_batch(batch),
(ApexDQN pid=61688)   File "/home/thomas/PycharmProjects/VirtualEnvironment/lib/python3.10/site-packages/ray/rllib/utils/deprecation.py", line 115, in _ctor
(ApexDQN pid=61688)     deprecation_warning(
(ApexDQN pid=61688)   File "/home/thomas/PycharmProjects/VirtualEnvironment/lib/python3.10/site-packages/ray/rllib/utils/deprecation.py", line 43, in deprecation_warning
(ApexDQN pid=61688)     raise DeprecationWarning(msg)
(ApexDQN pid=61688) DeprecationWarning: `add_batch` has been deprecated. Use `ReplayBuffer.add()` instead.
(ApexDQN pid=61688) 
(ApexDQN pid=61688) During handling of the above exception, another exception occurred:
(ApexDQN pid=61688) 
(ApexDQN pid=61688) ray::MultiAgentPrioritizedReplayBuffer.apply() (pid=61776, ip=10.0.10.103)
(ApexDQN pid=61688)   File "/home/thomas/PycharmProjects/VirtualEnvironment/lib/python3.10/site-packages/ray/rllib/utils/actor_manager.py", line 176, in apply
(ApexDQN pid=61688)     if self.config.recreate_failed_workers:
(ApexDQN pid=61688) AttributeError: 'MultiAgentPrioritizedReplayBuffer' object has no attribute 'config'
== Status ==
Current time: 2023-02-15 15:15:12 (running for 00:00:11.61)
Memory usage on this node: 10.7/15.3 GiB 
Using FIFO scheduling algorithm.
Resources requested: 5.0/20 CPUs, 0/1 GPUs, 0.0/5.27 GiB heap, 0.0/2.63 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/thomas/ray_results/ApexDQN_2023-02-15_15-14-57
Number of trials: 1/1 (1 RUNNING)
+---------------------------------+----------+-------------------+
| Trial name                      | status   | loc               |
|---------------------------------+----------+-------------------|
| ApexDQN_CartPole-v1_2251f_00000 | RUNNING  | 10.0.10.103:61688 |
+---------------------------------+----------+-------------------+

^C2023-02-15 15:15:13,507   WARNING tune.py:690 -- Stop signal received (e.g. via SIGINT/Ctrl+C), ending Ray Tune run. This will try to checkpoint the experiment state one last time. Press CTRL+C (or send SIGINT/SIGKILL/SIGTERM) to skip. 
== Status ==
Current time: 2023-02-15 15:15:17 (running for 00:00:16.62)
Memory usage on this node: 10.6/15.3 GiB 
Using FIFO scheduling algorithm.
Resources requested: 5.0/20 CPUs, 0/1 GPUs, 0.0/5.27 GiB heap, 0.0/2.63 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/thomas/ray_results/ApexDQN_2023-02-15_15-14-57
Number of trials: 1/1 (1 RUNNING)
+---------------------------------+----------+-------------------+
| Trial name                      | status   | loc               |
|---------------------------------+----------+-------------------|
| ApexDQN_CartPole-v1_2251f_00000 | RUNNING  | 10.0.10.103:61688 |
+---------------------------------+----------+-------------------+

== Status ==
Current time: 2023-02-15 15:15:17 (running for 00:00:16.62)
Memory usage on this node: 10.6/15.3 GiB 
Using FIFO scheduling algorithm.
Resources requested: 5.0/20 CPUs, 0/1 GPUs, 0.0/5.27 GiB heap, 0.0/2.63 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/thomas/ray_results/ApexDQN_2023-02-15_15-14-57
Number of trials: 1/1 (1 RUNNING)
+---------------------------------+----------+-------------------+
| Trial name                      | status   | loc               |
|---------------------------------+----------+-------------------|
| ApexDQN_CartPole-v1_2251f_00000 | RUNNING  | 10.0.10.103:61688 |
+---------------------------------+----------+-------------------+

2023-02-15 15:15:17,712 ERROR tune.py:758 -- Trials did not complete: [ApexDQN_CartPole-v1_2251f_00000]
2023-02-15 15:15:17,713 INFO tune.py:762 -- Total run time: 16.83 seconds (16.62 seconds for the tuning loop).
2023-02-15 15:15:17,713 WARNING tune.py:768 -- Experiment has been interrupted, but the most recent state was saved. You can continue running this experiment by passing `resume=True` to `tune.run()`

At some point in the output, you can see the line DeprecationWarning: `add_batch` has been deprecated. Use `ReplayBuffer.add()` instead.

This causes an exception to be thrown in the worker, this propagates up to ray's exception handling, which tries to access a config member on the object that threw the exception (a MultiAgentPrioritizedReplayBuffer), which fails (AttributeError: 'MultiAgentPrioritizedReplayBuffer' object has no attribute 'config')

After this point, the tune trial gets stuck. Tune claims the trial is running, but no progress is made.

I have recreated the circumstances for this bug in the reproduction script below (using the throw_replay_buffer_error function) and can confirm that this throws an exception and doesn't print Bla!.

Exepected behaviour: I expect to be able to train a Apex DQN agent on the CartPole problem without crashes.

Versions / Dependencies

OS:

Distributor ID: Ubuntu
Description:    Ubuntu 22.04.1 LTS
Release:    22.04
Codename:   jammy

Ray version:

Name: ray
Version: 2.2.0
Summary: Ray provides a simple, universal API for building distributed applications.
Home-page: https://github.com/ray-project/ray
Author: Ray Team
Author-email: ray-dev@googlegroups.com
License: Apache 2.0
Location: /home/thomas/PycharmProjects/VirtualEnvironment/lib/python3.10/site-packages
Requires: aiosignal, attrs, click, filelock, frozenlist, grpcio, jsonschema, msgpack, numpy, packaging, protobuf, pyyaml, requests, virtualenv
Required-by:

Python version:

Python 3.10.6

Reproduction script

from ray.tune import TuneConfig
from ray.air import RunConfig
from ray.rllib.algorithms.apex_dqn import ApexDQN
from ray.rllib.utils.replay_buffers import MultiAgentPrioritizedReplayBuffer
from ray.rllib import SampleBatch

def throw_replay_buffer_error():
    buffer: MultiAgentPrioritizedReplayBuffer = MultiAgentPrioritizedReplayBuffer()

    buffer.add_batch(SampleBatch())

    print("Bla!")

def main() -> int:
    # throw_replay_buffer_error()

    param_space = ApexDQN.get_default_config().environment(
        env="CartPole-v1"
    ).resources(
        num_gpus=0,
        num_cpus_per_worker=0,
        num_gpus_per_worker=0,
        num_cpus_for_local_worker=1
    ).rollouts(
        num_rollout_workers=0
    )

    run_config = RunConfig()
    run_config.stop = {"episodes_total": 1}

    tune_config = TuneConfig()

    tuner = tune.Tuner(
        trainable=ApexDQN,
        param_space=param_space,
        run_config=run_config,
        tune_config=tune_config
    )

    result = tuner.fit()

    return 0

if __name__ == "__main__":
    exit(main())

Issue Severity

High: It blocks me from completing my task.

avnishn commented 1 year ago

thanks for bringing this up @ThomasCassimon. We'll try to get a fix out for this in the coming week.

ThomasCassimon commented 1 year ago

Quick note, looking at the release notes for 2.3, it seems that the exception thrown was changed to ValueError (See: https://github.com/ray-project/ray/pull/30255)

ThomasCassimon commented 1 year ago

@avnishn do you expect this fix will make it into the next release of ray (2.4), or will it take longer to fix this issue?

avnishn commented 1 year ago

I'm attempting to repro your bug right now. If it is infact a bug the fix will land for 2.4

avnishn commented 1 year ago

ok the fix is up. Thank you for making such a great repro script @ThomasCassimon

ray-project / ray