Closed Glaucus-2G closed 3 years ago
@Glaucus-2G can you please provide instructions to reproduce on CartPole-v0?
@Glaucus-2G can you please provide instructions to reproduce on CartPole-v0?
I have kill my code's process, and I try to run code in Pendulu-v0, it reports like this:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 467, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/opt/conda/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 381, in fetch_result
result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
File "/opt/conda/lib/python3.7/site-packages/ray/worker.py", line 1513, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError: ray::DDPG.train() (pid=487, ip=10.11.0.7)
File "python/ray/_raylet.pyx", line 452, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 407, in ray._raylet.execute_task.function_executor
File "/opt/conda/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 502, in train
raise e
File "/opt/conda/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 491, in train
result = Trainable.train(self)
File "/opt/conda/lib/python3.7/site-packages/ray/tune/trainable.py", line 261, in train
result = self._train()
File "/opt/conda/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 150, in _train
fetches = self.optimizer.step()
File "/opt/conda/lib/python3.7/site-packages/ray/rllib/optimizers/sync_replay_optimizer.py", line 108, in step
weights = ray.put(self.workers.local_worker().get_weights())
File "python/ray/_raylet.pyx", line 746, in ray._raylet.CoreWorker.put_serialized_object
File "python/ray/_raylet.pyx", line 720, in ray._raylet.CoreWorker._create_put_buffer
File "python/ray/_raylet.pyx", line 134, in ray._raylet.check_status
ray.exceptions.ObjectStoreFullError: Failed to put object 7292891120c8ed877a657a030800008801000000 in object store because it is full. Object size is 3921946 bytes.
The local object store is full of objects that are still in scope and cannot be evicted. Try increasing the object store memory available with ray.init(object_store_memory=<bytes>). You can also try setting an option to fallback to LRU eviction when the object store is full by calling ray.init(lru_evict=True). See also: https://ray.readthedocs.io/en/latest/memory-management.html.
The reason for not training is that the object store memory is full? Anyway, I'll restart the cluster and run
PPO on CartPole-v0 works properly, output like this:
Result for PPO_myenv_00000:
custom_metrics: {}
date: 2020-11-30_05-55-43
done: false
episode_len_mean: 198.18
episode_reward_max: 200.0
episode_reward_mean: 198.18
episode_reward_min: 103.0
episodes_this_iter: 61
episodes_total: 5687
experiment_id: f19b84a3510c4587af9ddd1468ef7181
experiment_tag: '0'
hostname: ray-worker-0-18
info:
grad_time_ms: 14063.637
learner:
default_policy:
cur_kl_coeff: 0.30000001192092896
cur_lr: 0.0010000000474974513
entropy: 0.3195417821407318
entropy_coeff: 0.0
kl: 0.010834978893399239
model: {}
policy_loss: -0.007023087237030268
total_loss: 554.733642578125
vf_explained_var: 0.21664145588874817
vf_loss: 554.7373657226562
load_time_ms: 5.792
num_steps_sampled: 984000
num_steps_trained: 976128
sample_time_ms: 571.986
update_time_ms: 31.434
iterations_since_restore: 82
node_ip: 10.11.0.18
num_healthy_workers: 60
off_policy_estimator: {}
perf:
cpu_util_percent: 9.276190476190479
ram_util_percent: 11.800000000000002
pid: 45
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_env_wait_ms: 0.11847467378902778
mean_inference_ms: 1.4094175804260436
mean_processing_ms: 0.2300744801790857
time_since_restore: 1212.2398884296417
time_this_iter_s: 14.623253583908081
time_total_s: 1212.2398884296417
timestamp: 1606715743
timesteps_since_restore: 984000
timesteps_this_iter: 12000
timesteps_total: 984000
training_iteration: 82
trial_id: '00000'
== Status ==
Memory usage on this node: 72.0/251.4 GiB
Using FIFO scheduling algorithm.
Resources requested: 61/64 CPUs, 0/32 GPUs, 0.0/191.75 GiB heap, 0.0/58.45 GiB objects
Result logdir: /root/ray_results/PPO
Number of trials: 1 (1 RUNNING)
+-----------------+----------+---------------+--------+------------------+--------+----------+
| Trial name | status | loc | iter | total time (s) | ts | reward |
|-----------------+----------+---------------+--------+------------------+--------+----------|
| PPO_myenv_00000 | RUNNING | 10.11.0.18:45 | 82 | 1212.24 | 984000 | 198.18 |
+-----------------+----------+---------------+--------+------------------+--------+----------+
PPO on my environment don't report NAN any more. But like DDPG, it stops training. I've been running the experiment for 16 hours, but the total time there is about 5 hours.
root@ray-head-0-5:~# tail -f out.log
Using FIFO scheduling algorithm.
Resources requested: 61/64 CPUs, 0/32 GPUs, 0.0/230.08 GiB heap, 0.0/70.8 GiB objects
Result logdir: /root/ray_results/PPO
Number of trials: 1 (1 RUNNING)
+-----------------+----------+----------------+--------+------------------+----------+----------+
| Trial name | status | loc | iter | total time (s) | ts | reward |
|-----------------+----------+----------------+--------+------------------+----------+----------|
| PPO_myenv_00000 | RUNNING | 10.11.0.12:181 | 114 | 18737.5 | 13680000 | 500446 |
+-----------------+----------+----------------+--------+------------------+----------+----------+
There is no mistake. What should I do about it?
After many attempts, this happens every time I run about 114 iterations.
And there was no memory leak according to the dashboard.
How should I check this problem?
This is quite odd. If you are able to reproduce with a dummy env (post a reproduction snippet here with np.zeros() for obs and so on), we can look into it further.
I run my code in a k8s cluster and it works properly when I use gym environment like "Pendulum"
But when I use my own environment, it reports some errors. Because my environment has continuous action spaces, so I have try PPO and DDPG.
When I use PPO, it will returns actions as Nan, so the experiment died. I also try to use DDPG, but every time it ran 100 iterations, it stop training. I never to kill process, but it has been keeping the following output for several days.
My code is like this:
Python environment version: python 3.7 ray 0.8.4 torch 1.3.1 torchvision 0.4.2 tensorflow 2.3.0