[rllib] Memory leak in environment worker in multi-agent setup

sergeivolodin commented 4 years ago

What is the problem?

When training in a multi-agent environment using multiple environment workers, the memory of the workers increases constantly and is not released after the policy updates.

== Status ==
Memory usage on this node: 5.2/8.7 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/10 CPUs, 0/0 GPUs, 0.0/5.03 GiB heap, 0.0/1.71 GiB objects
Result logdir: /home/sergei/ray_results/dummy_run
Number of trials: 1 (1 ERROR)
+------------------------------------+----------+-------+--------+------------------+---------+----------+
| Trial name                         | status   | loc   |   iter |   total time (s) |      ts |   reward |
|------------------------------------+----------+-------+--------+------------------+---------+----------|
| PPO_DummyMultiAgentEnv_fbe89_00000 | ERROR    |       |     57 |          4463.08 | 1881000 |        0 |
+------------------------------------+----------+-------+--------+------------------+---------+----------+
Number of errored trials: 1
+------------------------------------+--------------+---------------------------------------------------------------------------------------------------+
| Trial name                         |   # failures | error file                                                                                        |
|------------------------------------+--------------+---------------------------------------------------------------------------------------------------|
| PPO_DummyMultiAgentEnv_fbe89_00000 |            1 | /home/sergei/ray_results/dummy_run/PPO_DummyMultiAgentEnv_0_2020-08-05_16-21-42pm5cl_th/error.txt |
+------------------------------------+--------------+---------------------------------------------------------------------------------------------------+

ray.exceptions.RayTaskError(RayOutOfMemoryError): ray::RolloutWorker.par_iter_next() (pid=7604, ip=192.168.175.153)
  File "python/ray/_raylet.pyx", line 408, in ray._raylet.execute_task
  File "/home/sergei/miniconda3/envs/fresh_ray/lib/python3.8/site-packages/ray/memory_monitor.py", line 137, in raise_if_low_memory
    raise RayOutOfMemoryError(
ray.memory_monitor.RayOutOfMemoryError: Heap memory usage for ray_RolloutWorker_7604 is 0.4395 / 0.4395 GiB limit

If no memory limit is set, the processes run out of system memory and are killed. If the memory_per_worker limit is set, they go past the limit and are killed.

Ray version and other system information (Python version, TensorFlow version, OS):

Name	Value
Ray version	`ray==0.8.6`
Python version	`Python 3.8.5 (default, Aug 5 2020, 08:36:46)`
TensorFlow version	`tensorflow==2.3.0`
OS	`Ubuntu 18.04.4 LTS (GNU/Linux 4.19.121-microsoft-standard x86_64)` (same thing in non-wsl as well)

Reproduction

Run this and measure memory consumption. If you remove memory_per_worker limits, it will take longer, as workers will try to consume all system memory.

import numpy as np
import ray
import tensorflow as tf
from ray import tune
from ray.rllib.agents.ppo import PPOTrainer
from ray.rllib.agents.ppo.ppo_tf_policy import PPOTFPolicy
from ray.rllib.env.multi_agent_env import MultiAgentEnv
import numpy as np
from gym.spaces import Box
from ray.tune.registry import register_env

def dim_to_gym_box(dim, val=np.inf):
    """Create gym.Box with specified dimension."""
    high = np.full((dim,), fill_value=val)
    return Box(low=-high, high=high)

class DummyMultiAgentEnv(MultiAgentEnv):
    """Return zero observations."""

    def __init__(self, config):
        del config  # Unused
        super(DummyMultiAgentEnv, self).__init__()
        self.config = dict(act_dim=17, obs_dim=380, n_players=2, n_steps=1000)
        self.players = ["player_%d" % p for p in range(self.config['n_players'])]
        self.current_step = 0

    def _obs(self):
        return np.zeros((self.config['obs_dim'],))

    def reset(self):
        self.current_step = 0
        return {p: self._obs() for p in self.players}

    def step(self, action_dict):
        done = self.current_step >= self.config['n_steps']
        self.current_step += 1

        obs = {p: self._obs() for p in self.players}
        rew = {p: 0.0 for p in self.players}
        dones = {p: done for p in self.players + ["__all__"]}
        infos = {p: {} for p in self.players}

        return obs, rew, dones, infos

    @property
    def observation_space(self):
        return dim_to_gym_box(self.config['obs_dim'])

    @property
    def action_space(self):
        return dim_to_gym_box(self.config['act_dim'])

def create_env(config):
    """Create the dummy environment."""
    return DummyMultiAgentEnv(config)

env_name = "DummyMultiAgentEnv"
register_env(env_name, create_env)

def get_trainer_config(env_config, train_policies, num_workers=5, framework="tfe"):
    """Build configuration for 1 run."""

    # obtaining parameters from the environment
    env = create_env(env_config)
    act_space = env.action_space
    obs_space = env.observation_space
    players = env.players
    del env

    def get_policy_config(p):
        """Get policy configuration for a player."""
        return (PPOTFPolicy, obs_space, act_space, {
            "model": {
                "use_lstm": False,
                "fcnet_hiddens": [64, 64],
            },
            "framework": framework,
        })

    # creating policies
    policies = {p: get_policy_config(p) for p in players}

    # trainer config
    config = {
        "env": env_name, "env_config": env_config, "num_workers": num_workers,
        "multiagent": {"policy_mapping_fn": lambda x: x, "policies": policies,
                       "policies_to_train": train_policies},
        "framework": framework,
        "train_batch_size": 32768,
        "num_sgd_iter": 1,
        "sgd_minibatch_size": 32768,

        # 450 megabytes for each worker (enough for 1 iteration)
        "memory_per_worker": 450 * 1024 * 1024,
        "object_store_memory_per_worker": 128 * 1024 * 1024,
    }
    return config

def tune_run():
    ray.init(ignore_reinit_error=True)
    config = get_trainer_config(train_policies=['player_1', 'player_2'], env_config={})
    return tune.run("PPO", config=config, verbose=True, name="dummy_run", num_samples=1)

if __name__ == '__main__':
    tune_run()

[x] I have verified my script runs in a clean environment and reproduces the issue.
[x] I have verified the issue also occurs with the latest wheels.

richardliaw commented 4 years ago

@sergeivolodin How high does it go? Also, can you try different Tensorflow versions?

sergeivolodin commented 4 years ago

@richardliaw Here it ate all the remaining memory on my laptop and crashed wsl. Which versions of tensorflow do you want me to try?

jyericlin commented 4 years ago

We also observed the same issue running QMIX with PyTorch on SMAC. In my case, the machine has 16GB of RAM, and the training would eventually consume all the RAM and crash (at about 600+ iteration using the 2s3z map).

Tried setting object_store_memory=1*10**9, redis_max_memory=1*10**9 for ray init but didn't help.

Ray version: 0.8.7 OS: Ubuntu 16.04 Pytorch version: 1.3.0 command used: python run_qmix.py --num-iters=1000 --num-workers=7 --map-name=2s3z (using the example from SMAC repository)

Error message:

Failure # 1 (occurred at 2020-08-20_13-12-55)
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/raydev/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 471, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/ubuntu/anaconda3/envs/raydev/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 430, in fetch_result
    result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
  File "/home/ubuntu/anaconda3/envs/raydev/lib/python3.6/site-packages/ray/worker.py", line 1538, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RayOutOfMemoryError): ^[[36mray::QMIX.train()^[[39m (pid=949, ip=10.11.0.13)
  File "python/ray/_raylet.pyx", line 440, in ray._raylet.execute_task
  File "/home/ubuntu/anaconda3/envs/raydev/lib/python3.6/site-packages/ray/memory_monitor.py", line 128, in raise_if_low_memory
    self.error_threshold))
ray.memory_monitor.RayOutOfMemoryError: More than 95% of the memory on node rltb-1 is used (14.9 / 15.67 GB). The top 10 memory consumers are:

PID MEM COMMAND
949 6.25GiB ray::QMIX
4628    0.5GiB  /home/ubuntu/StarCraftII/Versions/Base69232/SC2_x64 -listen 127.0.0.1 -port 24827 -dataDir /home/ubu
4631    0.5GiB  /home/ubuntu/StarCraftII/Versions/Base69232/SC2_x64 -listen 127.0.0.1 -port 24013 -dataDir /home/ubu
4637    0.5GiB  /home/ubuntu/StarCraftII/Versions/Base69232/SC2_x64 -listen 127.0.0.1 -port 20642 -dataDir /home/ubu
4625    0.5GiB  /home/ubuntu/StarCraftII/Versions/Base69232/SC2_x64 -listen 127.0.0.1 -port 20480 -dataDir /home/ubu
4624    0.5GiB  /home/ubuntu/StarCraftII/Versions/Base69232/SC2_x64 -listen 127.0.0.1 -port 19296 -dataDir /home/ubu
4626    0.5GiB  /home/ubuntu/StarCraftII/Versions/Base69232/SC2_x64 -listen 127.0.0.1 -port 21827 -dataDir /home/ubu
4627    0.5GiB  /home/ubuntu/StarCraftII/Versions/Base69232/SC2_x64 -listen 127.0.0.1 -port 17905 -dataDir /home/ubu

sergeivolodin commented 4 years ago

@jyericlin as a workaround, we just create a separate process for every training step, which opens a checkpoint, does the iteration and then saves a checkpoint (pseudocode below, fully functional example here). The extra time spent on creating process/checkpointing does not seem too bad (for our case!)

while True:
  config_filename = pickle_config(config)

  # starts a new python process from bash
  # important, can't just fork
  # because of tensorflow+fork import issues
  start_process_and_wait(target=train, config_filename)
  results = unpickle_results(config)
  delete_temporary_files()
  if results.iteration > N: break
  config['checkpoint'] = results['checkpoint']

def train(config_filename):
  config = unpickle_config(config_filename)
  trainer = PPO(config)
  trainer.restore(config['checkpoint'])
  results = trainer.train()
  checkpoint = trainer.save()
  results['checkpoint'] = checkpoint
  pickle_results(results)

  # important -- otherwise memory will go up!
  trainer.stop()

Note that in train we also need to reconnect to the existing ray instance to reuse worker processes.

sergeivolodin commented 4 years ago

@richardliaw any updates? Do you want me to try different tf versions? Thank you.

richardliaw commented 4 years ago

Hmm, could you try the latest Ray wheels (latest snapshot of master) to see if this was fixed?

sergeivolodin commented 4 years ago

@richardliaw same thing

sergeivolodin commented 4 years ago

@richardliaw is there a way to attach a Python debugger to a ray worker, preferably from an IDE like PyCharm? I could take a look at where and why the memory is being used

richardliaw commented 4 years ago

There's now a Ray PDB tool: https://docs.ray.io/en/master/ray-debugging.html?highlight=debugger

jackbart94 commented 3 years ago

Is there any update on this issue? I have been dealing with the same.

azzeddineCH commented 3 years ago

Hi @sergeivolodin, how are you plotting this graphs, please? suspecting that I'm having a similar issue

sergeivolodin commented 3 years ago

@azzeddineCH using this custom script:

https://github.com/HumanCompatibleAI/better-adversarial-defenses/tree/master/other/memory_profile

(assuming python3) $ pip install psutil numpy matplotlib humanize inquirer
$ python mem_profile.py -- will write data to a file for all processes for your user and save to mem_out_{username}_{time_start}.txt (will print the filename)
Run $ python mem_analyze.py. For the last one, there are options

You can either specify the file --input FILENAME, or the script will ask you which file to open (navigate up-down, use Enter to select)
By-default, the script will report usage for ['ray::ImplicitFunc', 'ray::RolloutWorker'], but you can change that either by
- adding the --customize flag and selecting the processes interactively (navigate up/down, SPACE bar to select, enter to finish selecting)
- adding program names with --track PROCESS_NAME flag
To subtract the initial memory values from all chart values, add --subtract
--max_lines only reads that many lines from the file (useful if the first script was running for more time than the monitored program itself)

Good luck!

Bam4d commented 3 years ago

I think I'm runing into this too. seems like there are "jumps" in memory at regular intervals for me. The leak is very slow for me, and I'm pretty sure I've ruled out everything else it could be but a bug in MA implementation in RLLib.

This is with the latest wheels, python 3.8, pytorch 1.8.0

Bam4d commented 3 years ago

I've done some more digging using tracemalloc and I dont think the bug is actually in the python code as there's no large consistent allocations of python objects. This leaves it to be a potential issue with pytorch or some c++ library, or something to do with how ray handles worker memory.

The bug also does not seem to happen consistently for me across machines. More specifically, on servers where cgroups are enabled I get the memory leak, but on mu local machine I do not. @sven1977 any ideas? I added a bit more information here: https://discuss.ray.io/t/help-debugging-a-memory-leak-in-rllib/2100

ray-project / ray

[rllib] Memory leak in environment worker in multi-agent setup #9964

What is the problem?

Reproduction