ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.92k stars 5.77k forks source link

[RLlib][k8s] Why the code stop running without reporting errors? #12488

Closed Glaucus-2G closed 3 years ago

Glaucus-2G commented 3 years ago

I run my code in a k8s cluster and it works properly when I use gym environment like "Pendulum"

But when I use my own environment, it reports some errors. Because my environment has continuous action spaces, so I have try PPO and DDPG.

When I use PPO, it will returns actions as Nan, so the experiment died. I also try to use DDPG, but every time it ran 100 iterations, it stop training. I never to kill process, but it has been keeping the following output for several days.

Using FIFO scheduling algorithm.
Resources requested: 61/64 CPUs, 0/32 GPUs, 0.0/191.75 GiB heap, 0.0/58.45 GiB objects
Result logdir: /root/ray_results/DDPG
Number of trials: 1 (1 RUNNING)
+------------------+----------+---------------+--------+------------------+--------+----------+
| Trial name       | status   | loc           |   iter |   total time (s) |     ts |   reward |
|------------------+----------+---------------+--------+------------------+--------+----------|
| DDPG_myenv_00000 | RUNNING  | 10.11.0.15:47 |    111 |          2475.68 | 113220 |      nan |
+------------------+----------+---------------+--------+------------------+--------+----------+

My code is like this:

import ray
from ray.rllib.agents.ppo import PPOTrainer
from ray.tune.registry import register_env
import gym
import time
from ray import tune
env = gym.make('Ucav-v0')

class MyEnv(gym.Env):
    def __init__(self, env_config):
        self.env = gym.make('Ucav-v0')
        self.action_space = self.env.action_space
        self.observation_space = self.env.observation_space

    def reset(self):
        obs = self.env.reset()
        return obs

    def step(self, action):
        obs, reward, done, info = self.env.step(action)
        return obs, reward, done, info

register_env("myenv", lambda config: MyEnv(config))

def main():
    ray.init(address="127.0.0.1:6379")
    tune.run(
        "DDPG",
        stop = {'training_iteration': 1000000},
        config = {
            "env": "myenv",
            "actor_lr": 1e-3,
            "critic_lr": 1e-3,
            "num_workers": 60,
            "use_pytorch": False,
           # "rollout_fragment_length": 200,
             "train_batch_size": 2000,
             "batch_mode": "truncate_episodes",
        },
    )

Python environment version: python 3.7 ray 0.8.4 torch 1.3.1 torchvision 0.4.2 tensorflow 2.3.0

richardliaw commented 3 years ago

@Glaucus-2G can you please provide instructions to reproduce on CartPole-v0?

Glaucus-2G commented 3 years ago

@Glaucus-2G can you please provide instructions to reproduce on CartPole-v0?

I have kill my code's process, and I try to run code in Pendulu-v0, it reports like this:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 467, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/opt/conda/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 381, in fetch_result
    result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
  File "/opt/conda/lib/python3.7/site-packages/ray/worker.py", line 1513, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError: ray::DDPG.train() (pid=487, ip=10.11.0.7)
  File "python/ray/_raylet.pyx", line 452, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 407, in ray._raylet.execute_task.function_executor
  File "/opt/conda/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 502, in train
    raise e
  File "/opt/conda/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 491, in train
    result = Trainable.train(self)
  File "/opt/conda/lib/python3.7/site-packages/ray/tune/trainable.py", line 261, in train
    result = self._train()
  File "/opt/conda/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 150, in _train
    fetches = self.optimizer.step()
  File "/opt/conda/lib/python3.7/site-packages/ray/rllib/optimizers/sync_replay_optimizer.py", line 108, in step
    weights = ray.put(self.workers.local_worker().get_weights())
  File "python/ray/_raylet.pyx", line 746, in ray._raylet.CoreWorker.put_serialized_object
  File "python/ray/_raylet.pyx", line 720, in ray._raylet.CoreWorker._create_put_buffer
  File "python/ray/_raylet.pyx", line 134, in ray._raylet.check_status
ray.exceptions.ObjectStoreFullError: Failed to put object 7292891120c8ed877a657a030800008801000000 in object store because it is full. Object size is 3921946 bytes.
The local object store is full of objects that are still in scope and cannot be evicted. Try increasing the object store memory available with ray.init(object_store_memory=<bytes>). You can also try setting an option to fallback to LRU eviction when the object store is full by calling ray.init(lru_evict=True). See also: https://ray.readthedocs.io/en/latest/memory-management.html.

The reason for not training is that the object store memory is full? Anyway, I'll restart the cluster and run

Glaucus-2G commented 3 years ago

PPO on CartPole-v0 works properly, output like this:

Result for PPO_myenv_00000:
  custom_metrics: {}
  date: 2020-11-30_05-55-43
  done: false
  episode_len_mean: 198.18
  episode_reward_max: 200.0
  episode_reward_mean: 198.18
  episode_reward_min: 103.0
  episodes_this_iter: 61
  episodes_total: 5687
  experiment_id: f19b84a3510c4587af9ddd1468ef7181
  experiment_tag: '0'
  hostname: ray-worker-0-18
  info:
    grad_time_ms: 14063.637
    learner:
      default_policy:
        cur_kl_coeff: 0.30000001192092896
        cur_lr: 0.0010000000474974513
        entropy: 0.3195417821407318
        entropy_coeff: 0.0
        kl: 0.010834978893399239
        model: {}
        policy_loss: -0.007023087237030268
        total_loss: 554.733642578125
        vf_explained_var: 0.21664145588874817
        vf_loss: 554.7373657226562
    load_time_ms: 5.792
    num_steps_sampled: 984000
    num_steps_trained: 976128
    sample_time_ms: 571.986
    update_time_ms: 31.434
  iterations_since_restore: 82
  node_ip: 10.11.0.18
  num_healthy_workers: 60
  off_policy_estimator: {}
  perf:
    cpu_util_percent: 9.276190476190479
    ram_util_percent: 11.800000000000002
  pid: 45
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_env_wait_ms: 0.11847467378902778
    mean_inference_ms: 1.4094175804260436
    mean_processing_ms: 0.2300744801790857
  time_since_restore: 1212.2398884296417
  time_this_iter_s: 14.623253583908081
  time_total_s: 1212.2398884296417
  timestamp: 1606715743
  timesteps_since_restore: 984000
  timesteps_this_iter: 12000
  timesteps_total: 984000
  training_iteration: 82
  trial_id: '00000'

== Status ==
Memory usage on this node: 72.0/251.4 GiB
Using FIFO scheduling algorithm.
Resources requested: 61/64 CPUs, 0/32 GPUs, 0.0/191.75 GiB heap, 0.0/58.45 GiB objects
Result logdir: /root/ray_results/PPO
Number of trials: 1 (1 RUNNING)
+-----------------+----------+---------------+--------+------------------+--------+----------+
| Trial name      | status   | loc           |   iter |   total time (s) |     ts |   reward |
|-----------------+----------+---------------+--------+------------------+--------+----------|
| PPO_myenv_00000 | RUNNING  | 10.11.0.18:45 |     82 |          1212.24 | 984000 |   198.18 |
+-----------------+----------+---------------+--------+------------------+--------+----------+

PPO on my environment don't report NAN any more. But like DDPG, it stops training. I've been running the experiment for 16 hours, but the total time there is about 5 hours.

root@ray-head-0-5:~# tail -f out.log 
Using FIFO scheduling algorithm.
Resources requested: 61/64 CPUs, 0/32 GPUs, 0.0/230.08 GiB heap, 0.0/70.8 GiB objects
Result logdir: /root/ray_results/PPO
Number of trials: 1 (1 RUNNING)
+-----------------+----------+----------------+--------+------------------+----------+----------+
| Trial name      | status   | loc            |   iter |   total time (s) |       ts |   reward |
|-----------------+----------+----------------+--------+------------------+----------+----------|
| PPO_myenv_00000 | RUNNING  | 10.11.0.12:181 |    114 |          18737.5 | 13680000 |   500446 |
+-----------------+----------+----------------+--------+------------------+----------+----------+

There is no mistake. What should I do about it?

Glaucus-2G commented 3 years ago

After many attempts, this happens every time I run about 114 iterations.

And there was no memory leak according to the dashboard.

How should I check this problem?

ericl commented 3 years ago

This is quite odd. If you are able to reproduce with a dummy env (post a reproduction snippet here with np.zeros() for obs and so on), we can look into it further.