[rllib] rllib trainer fails to load from checkpoint, the custom model weights are not being updated.

While restoring the checkpoint trainer has a same episode reward mean when it was starting afresh.

Ray version (0.8.5) and other system information (Python 3.6, Pytorch, Ubuntu-16.04): The episode reward after restoring and after 1st iteration both are same. While saving checkpoint the trainer gave a positive episode reward mean.

2020-07-01 18:00:03,733 INFO resource_spec.py:212 -- Starting Ray with 15.92 GiB memory available for workers and up to 7.96 GiB for objects. You can adjust these settings with ray.init(memory=, object_store_memory=).

2020-07-01 18:00:04,085 INFO services.py:1170 -- View the Ray dashboard at localhost:8265 2020-07-01 18:00:05,012 INFO trainer.py:580 -- Current log_level is ERROR. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags. /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float16 warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow')) 2020-07-01 18:00:09,084 INFO trainable.py:217 -- Getting current IP. 2020-07-01 18:00:09,084 WARNING util.py:37 -- Install gputil for GPU system monitoring. 2020-07-01 18:00:09,117 INFO trainable.py:217 -- Getting current IP. 2020-07-01 18:00:09,118 INFO trainable.py:423 -- Restored on 172.31.8.135 from checkpoint: /data/mouli/weights/ray_weight/checkpoint_18/checkpoint-18 2020-07-01 18:00:09,118 INFO trainable.py:430 -- Current state after restoring: {'_iteration': 18, '_timesteps_total': None, '_time_total': 18517.08517551422, '_episodes_total': 72} (pid=17380) /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float16 (pid=17380) warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow')) (pid=17379) /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float16 (pid=17379) warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow')) (pid=17385) /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float16 (pid=17385) warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow')) (pid=17381) /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float16 (pid=17381) warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow')) (pid=17380) Resetting the environment (pid=17385) Resetting the environment (pid=17379) Resetting the environment (pid=17381) Resetting the environment custom_metrics: {} date: 2020-07-01_18-08-05 done: false episode_len_mean: 5134.0 episode_reward_max: -33356.300000000105 episode_reward_mean: -33644.5250000001 episode_reward_min: -34009.0500000001 episodes_this_iter: 4 episodes_total: 76 experiment_id: 72d8dd3e8b2a48f59e87fe020cc7cc81 hostname: ip-172-31-8-135 info: last_target_update_ts: 390184 learner: default_policy: allreduce_latency: 0.0 cur_lr: 0.0001 grad_gnorm: 7.260672706479739 max_q: -0.8520004153251648 mean_q: -1.1661407947540283 mean_td_error: 6.130273342132568 min_q: -1.7016963958740234 td_error: "[6.9096565 7.134803 6.9150834 6.958553 6.8597746 6.9058785\n\ \ 6.8203487 6.8212156 6.768637 0.6852819 6.979598 6.985238\n 6.648487\ \ 7.0201597 6.7820287 6.9820337 0.94858754 0.9307786\n 2.457552 6.7885966\ \ 6.7985487 6.26636 6.756381 6.9849377\n 6.779475 6.7952523 6.5938654\ \ 6.5848207 6.6272287 7.0291796\n 6.69104 6.959365 ]" num_steps_sampled: 390184 num_steps_trained: 608 num_target_updates: 19 iterations_since_restore: 1 node_ip: 172.31.8.135 num_healthy_workers: 4 off_policy_estimator: {} perf: cpu_util_percent: 90.81380323054331 ram_util_percent: 24.028781204111603 pid: 17335 policy_reward_max: {} policy_reward_mean: {} policy_reward_min: {} sampler_perf: mean_env_wait_ms: 89.42585836482954 mean_inference_ms: 2.540503353780061 mean_processing_ms: 0.34010035385755855

Traceback (most recent call last):
  File "ray_drl.py", line 606, in <module>
    print(check(weights_before_load, weights_after_load, false=False))
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ray/rllib/utils/test_utils.py", line 204, in check
    raise e
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ray/rllib/utils/test_utils.py", line 199, in check
    np.testing.assert_almost_equal(x, y, decimal=decimals)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/numpy/testing/_private/utils.py", line 588, in assert_almost_equal
    raise AssertionError(_build_err_msg())
AssertionError: 
Arrays are not almost equal to 5 decimals

The weights aren't the same after restoring and before restoring but the episode reward is not similar to a pretrained model. @ @

ray-project / ray

[rllib] rllib trainer fails to load from checkpoint, the custom model weights are not being updated. #9253

While restoring the checkpoint trainer has a same episode reward mean when it was starting afresh.