Closed scottsun94 closed 1 year ago
for 1 and 2, can you check with rl folks what are the preferred output? I am not sure what should replace iteration
?
4 will be fixed by https://github.com/ray-project/ray/pull/33871
For 1 and 2, @gjoliver @kouroshHakha can you comment on these two?
What happened + What you expected to happen
It seems that we do show "iteration". Not sure if it's a good thing or not because users may not be familiar with this "iteration" concept.
We should ask RLlib team to choose fewer metrics for default output (not sure if it should be customized per algorithm)
Format of the output should be updated as mentioned in other related issues
After I changed the AIR_VERBOSITY to 0, I started to see the old output flow.
/home/ray/anaconda3/lib/python3.7/site-packages/ray/tune/progress_reporter.py:351: UserWarning: Both 'metric' and 'mode' must be set to be able to sort by metric. No sorting is performed. "Both 'metric' and 'mode' must be set to be able " (pid=127980) 2023-03-29 09:22:12,160 WARNING framework.py:28 -- Not importing JAX for test purposes. (pid=127980) 2023-03-29 09:22:12,483 WARNING framework.py:28 -- Not importing JAX for test purposes. (Impala pid=127980) 2023-03-29 09:22:12,913 WARNING algorithm_config.py:637 -- Cannot create ImpalaConfig from given
config_dict
! Property __stdout_file__ not supported. (Impala pid=127980) A.L.E: Arcade Learning Environment (version 0.8.0+919230b) (Impala pid=127980) [Powered by Stella] (Impala pid=127980) 2023-03-29 09:22:13,519 INFO algorithm.py:528 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags. (pid=128252) 2023-03-29 09:22:20,796 WARNING framework.py:28 -- Not importing JAX for test purposes. (pid=128253) 2023-03-29 09:22:20,781 WARNING framework.py:28 -- Not importing JAX for test purposes. (RolloutWorker pid=128252) A.L.E: Arcade Learning Environment (version 0.8.0+919230b) (RolloutWorker pid=128252) [Powered by Stella] (RolloutWorker pid=128250) 2023-03-29 09:22:21,816 WARNING deprecation.py:51 -- DeprecationWarning:FrameStack
has been deprecated. This will raise an error in the future! == Status == Current time: 2023-03-29 09:22:05 (running for 00:00:02.06) Using FIFO scheduling algorithm. Logical resource usage: 11.0/32 CPUs, 1.0/2 GPUs (0.0/1.0 accelerator_type:M60) Result logdir: /home/ray/ray_results/impala-breakoutnoframeskip-v5-torch Number of trials: 1/1 (1 RUNNING) +------------------------------------+----------+-------------------+-------------+ | Trial name | status | loc | framework | |------------------------------------+----------+-------------------+-------------| | IMPALA_ALE_Breakout-v5_d70d1_00000 | RUNNING | 10.0.2.182:127980 | torch | +------------------------------------+----------+-------------------+-------------+(Impala pid=127980) 2023-03-29 09:22:25,927 INFO trainable.py:176 -- Trainable.setup took 12.410 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads. (Impala pid=127980) 2023-03-29 09:22:25,928 WARNING util.py:67 -- Install gputil for GPU system monitoring. (pid=128258) 2023-03-29 09:22:21,337 WARNING framework.py:28 -- Not importing JAX for test purposes. [repeated 18x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication) == Status == Current time: 2023-03-29 09:22:35 (running for 00:00:32.37) Using FIFO scheduling algorithm. Logical resource usage: 11.0/32 CPUs, 1.0/2 GPUs (0.0/1.0 accelerator_type:M60) Result logdir: /home/ray/ray_results/impala-breakoutnoframeskip-v5-torch Number of trials: 1/1 (1 RUNNING) +------------------------------------+----------+-------------------+-------------+ | Trial name | status | loc | framework | |------------------------------------+----------+-------------------+-------------| | IMPALA_ALE_Breakout-v5_d70d1_00000 | RUNNING | 10.0.2.182:127980 | torch | +------------------------------------+----------+-------------------+-------------+
Trial IMPALA_ALE_Breakout-v5_d70d1_00000 reported custom_metrics={},episode_media={},info={'learner': {'default_policy': {'custom_metrics': {}, 'learner_stats': {'cur_lr': 0.0005, 'total_loss': 27.16767120361328, 'policy_loss': -2.7041698235955783e-25, 'entropy': 3.1160257634342076e-17, 'entropy_coeff': 0.01, 'var_gnorm': 9.948323249816895, 'vf_loss': 54.33534240722656, 'vf_explained_var': 0.025636136531829834}, 'model': {}, 'num_grad_updates_lifetime': 50.0, 'diff_num_grad_updates_vs_sampler_policy': 12.0}}, 'num_env_steps_sampled': 25250, 'num_env_steps_trained': 25000, 'num_agent_steps_sampled': 25250, 'num_agent_steps_trained': 25000, 'num_training_step_calls_since_last_synch_worker_weights': 79, 'num_weight_broadcasts': 79, 'num_samples_added_to_queue': 25000, 'learner_queue': {'size_count': 50, 'size_mean': 0.0, 'size_std': 0.0, 'size_quantiles': [0.0, 0.0, 0.0, 0.0, 0.0]}, 'timing_breakdown': {'learner_grad_time_ms': 128.81, 'learner_load_time_ms': 5.331, 'learner_load_wait_time_ms': 53.762, 'learner_dequeue_time_ms': 2284.748}},sampler_results={'episode_reward_max': 6.0, 'episode_reward_min': 0.0, 'episode_reward_mean': 1.2105263157894737, 'episode_len_mean': 717.4868421052631, 'episode_media': {}, 'episodes_this_iter': 152, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [0.0, 0.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 0.0, 2.0, 2.0, 2.0, 3.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 5.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, 4.0, 4.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 4.0, 2.0, 2.0, 2.0, 2.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], 'episode_lengths': [516, 501, 806, 809, 801, 521, 523, 539, 516, 543, 525, 521, 517, 543, 531, 527, 532, 1406, 524, 809, 817, 800, 813, 817, 541, 531, 1414, 1430, 1414, 1406, 516, 791, 798, 805, 814, 819, 537, 536, 525, 517, 519, 1426, 1406, 516, 814, 812, 796, 988, 1016, 515, 536, 540, 529, 527, 1206, 531, 523, 1422, 540, 816, 800, 803, 804, 818, 535, 516, 543, 529, 543, 528, 1430, 531, 519, 1546, 794, 817, 790, 814, 816, 528, 528, 515, 537, 528, 1218, 519, 1228, 1224, 1220, 649, 790, 814, 819, 818, 813, 532, 516, 531, 515, 529, 516, 529, 1410, 529, 532, 800, 793, 791, 817, 790, 537, 540, 520, 529, 520, 527, 532, 1214, 515, 528, 515, 1212, 818, 813, 812, 814, 992, 528, 516, 528, 521, 528, 541, 535, 532, 540, 791, 796, 814, 794, 816, 521, 535, 539, 536, 532, 541, 527, 537, 531, 525, 533]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 3.1451655233545353, 'mean_inference_ms': 6.558934767208633, 'mean_action_processing_ms': 0.6327834007776333, 'mean_env_wait_ms': 7.896650866089233, 'mean_env_render_ms': 0.0}, 'num_faulty_episodes': 0, 'connector_metrics': {}},episode_reward_max=6.0,episode_reward_min=0.0,episode_reward_mean=1.2105263157894737,episode_len_mean=717.4868421052631,episodes_this_iter=152,policy_reward_min={},policy_reward_max={},policy_reward_mean={},hist_stats={'episode_reward': [0.0, 0.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 0.0, 2.0, 2.0, 2.0, 3.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 5.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, 4.0, 4.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 4.0, 2.0, 2.0, 2.0, 2.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], 'episode_lengths': [516, 501, 806, 809, 801, 521, 523, 539, 516, 543, 525, 521, 517, 543, 531, 527, 532, 1406, 524, 809, 817, 800, 813, 817, 541, 531, 1414, 1430, 1414, 1406, 516, 791, 798, 805, 814, 819, 537, 536, 525, 517, 519, 1426, 1406, 516, 814, 812, 796, 988, 1016, 515, 536, 540, 529, 527, 1206, 531, 523, 1422, 540, 816, 800, 803, 804, 818, 535, 516, 543, 529, 543, 528, 1430, 531, 519, 1546, 794, 817, 790, 814, 816, 528, 528, 515, 537, 528, 1218, 519, 1228, 1224, 1220, 649, 790, 814, 819, 818, 813, 532, 516, 531, 515, 529, 516, 529, 1410, 529, 532, 800, 793, 791, 817, 790, 537, 540, 520, 529, 520, 527, 532, 1214, 515, 528, 515, 1212, 818, 813, 812, 814, 992, 528, 516, 528, 521, 528, 541, 535, 532, 540, 791, 796, 814, 794, 816, 521, 535, 539, 536, 532, 541, 527, 537, 531, 525, 533]},sampler_perf={'mean_raw_obs_processing_ms': 3.1451655233545353, 'mean_inference_ms': 6.558934767208633, 'mean_action_processing_ms': 0.6327834007776333, 'mean_env_wait_ms': 7.896650866089233, 'mean_env_render_ms': 0.0},num_faulty_episodes=0,connector_metrics={},num_healthy_workers=10,num_in_flight_async_reqs=20,num_remote_worker_restarts=0,num_agent_steps_sampled=25250,num_agent_steps_trained=25000,num_env_steps_sampled=25250,num_env_steps_trained=25000,num_env_steps_sampled_this_iter=25250,num_env_steps_trained_this_iter=25000,num_steps_trained_this_iter=25000,agent_timesteps_total=25250,timers={'training_iteration_time_ms': 0.348, 'sample_time_ms': 0.239, 'synch_weights_time_ms': 0.026},counters={'num_env_steps_sampled': 25250, 'num_env_steps_trained': 25000, 'num_agent_steps_sampled': 25250, 'num_agent_steps_trained': 25000, 'num_training_step_calls_since_last_synch_worker_weights': 79, 'num_weight_broadcasts': 79, 'num_samples_added_to_queue': 25000},perf={'cpu_util_percent': 35.023529411764706, 'ram_util_percent': 5.2} with parameters={'env_config': {'frameskip': 1, 'full_action_space': False, 'repeat_action_probability': 0.0}, 'rollout_fragment_length': 50, 'train_batch_size': 500, 'num_workers': 10, 'num_envs_per_worker': 5, 'clip_rewards': True, 'lr': 0.0005, 'num_gpus': 1, 'framework': 'torch', 'env': 'ALE/Breakout-v5'}. Trial IMPALA_ALE_Breakout-v5_d70d1_00000 reported custom_metrics={},episode_media={},info={'learner': {'default_policy': {'custom_metrics': {}, 'learner_stats': {'cur_lr': 0.0005, 'total_loss': 11.154390335083008, 'policy_loss': 0.0, 'entropy': 3.126514513951406e-08, 'entropy_coeff': 0.01, 'var_gnorm': 10.71700382232666, 'vf_loss': 22.308780670166016, 'vf_explained_var': 0.7661240696907043}, 'model': {}, 'num_grad_updates_lifetime': 110.0, 'diff_num_grad_updates_vs_sampler_policy': 13.0}}, 'num_env_steps_sampled': 55250, 'num_env_steps_trained': 55000, 'num_agent_steps_sampled': 55250, 'num_agent_steps_trained': 55000, 'num_training_step_calls_since_last_synch_worker_weights': 1036, 'num_weight_broadcasts': 143, 'num_samples_added_to_queue': 55000, 'learner_queue': {'size_count': 110, 'size_mean': 0.0, 'size_std': 0.0, 'size_quantiles': [0.0, 0.0, 0.0, 0.0, 0.0]}, 'timing_breakdown': {'learner_grad_time_ms': 125.55, 'learner_load_time_ms': 5.345, 'learner_load_wait_time_ms': 55.379, 'learner_dequeue_time_ms': 2533.677}},sampler_results={'episode_reward_max': 9.0, 'episode_reward_min': 0.0, 'episode_reward_mean': 2.3223684210526314, 'episode_len_mean': 835.8684210526316, 'episode_media': {}, 'episodes_this_iter': 152, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [0.0, 0.0, 0.0, 5.0, 0.0, 9.0, 9.0, 9.0, 9.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 9.0, 0.0, 9.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 9.0, 9.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 0.0, 5.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 5.0, 0.0, 0.0, 5.0, 5.0, 0.0, 0.0, 0.0, 9.0, 9.0, 9.0, 0.0, 0.0], 'episode_lengths': [540, 528, 515, 1422, 535, 1616, 1624, 1640, 1636, 536, 1640, 531, 535, 519, 528, 533, 520, 537, 540, 536, 540, 539, 1640, 519, 541, 520, 537, 524, 516, 1628, 1636, 535, 1410, 536, 539, 533, 528, 537, 1406, 521, 516, 516, 540, 529, 517, 528, 523, 516, 1620, 1636, 1224, 529, 527, 527, 515, 519, 1628, 1624, 1628, 519, 540, 520, 517, 543, 535, 520, 1308, 531, 515, 516, 1632, 1616, 536, 529, 1632, 1636, 516, 537, 517, 527, 535, 519, 524, 541, 515, 536, 520, 1644, 527, 539, 1616, 527, 1628, 541, 528, 1406, 525, 517, 529, 1426, 521, 520, 523, 516, 521, 528, 529, 1644, 1640, 515, 537, 524, 524, 517, 769, 531, 519, 1624, 1640, 531, 1644, 1620, 516, 516, 536, 536, 1426, 1418, 523, 1430, 531, 520, 535, 1620, 1644, 531, 543, 515, 1636, 1426, 529, 528, 1414, 1426, 533, 532, 541, 1644, 1620, 1640, 523, 540]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 3.0960928772947756, 'mean_inference_ms': 6.56534965209043, 'mean_action_processing_ms': 0.631634179682276, 'mean_env_wait_ms': 7.807842136017302, 'mean_env_render_ms': 0.0}, 'num_faulty_episodes': 0, 'connector_metrics': {}},episode_reward_max=9.0,episode_reward_min=0.0,episode_reward_mean=2.3223684210526314,episode_len_mean=835.8684210526316,episodes_this_iter=152,policy_reward_min={},policy_reward_max={},policy_reward_mean={},hist_stats={'episode_reward': [0.0, 0.0, 0.0, 5.0, 0.0, 9.0, 9.0, 9.0, 9.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 9.0, 0.0, 9.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 9.0, 9.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 0.0, 5.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 5.0, 0.0, 0.0, 5.0, 5.0, 0.0, 0.0, 0.0, 9.0, 9.0, 9.0, 0.0, 0.0], 'episode_lengths': [540, 528, 515, 1422, 535, 1616, 1624, 1640, 1636, 536, 1640, 531, 535, 519, 528, 533, 520, 537, 540, 536, 540, 539, 1640, 519, 541, 520, 537, 524, 516, 1628, 1636, 535, 1410, 536, 539, 533, 528, 537, 1406, 521, 516, 516, 540, 529, 517, 528, 523, 516, 1620, 1636, 1224, 529, 527, 527, 515, 519, 1628, 1624, 1628, 519, 540, 520, 517, 543, 535, 520, 1308, 531, 515, 516, 1632, 1616, 536, 529, 1632, 1636, 516, 537, 517, 527, 535, 519, 524, 541, 515, 536, 520, 1644, 527, 539, 1616, 527, 1628, 541, 528, 1406, 525, 517, 529, 1426, 521, 520, 523, 516, 521, 528, 529, 1644, 1640, 515, 537, 524, 524, 517, 769, 531, 519, 1624, 1640, 531, 1644, 1620, 516, 516, 536, 536, 1426, 1418, 523, 1430, 531, 520, 535, 1620, 1644, 531, 543, 515, 1636, 1426, 529, 528, 1414, 1426, 533, 532, 541, 1644, 1620, 1640, 523, 540]},sampler_perf={'mean_raw_obs_processing_ms': 3.0960928772947756, 'mean_inference_ms': 6.56534965209043, 'mean_action_processing_ms': 0.631634179682276, 'mean_env_wait_ms': 7.807842136017302, 'mean_env_render_ms': 0.0},num_faulty_episodes=0,connector_metrics={},num_healthy_workers=10,num_in_flight_async_reqs=20,num_remote_worker_restarts=0,num_agent_steps_sampled=55250,num_agent_steps_trained=55000,num_env_steps_sampled=55250,num_env_steps_trained=55000,num_env_steps_sampled_this_iter=30000,num_env_steps_trained_this_iter=30000,num_steps_trained_this_iter=30000,agent_timesteps_total=55250,timers={'training_iteration_time_ms': 0.347, 'sample_time_ms': 0.235, 'synch_weights_time_ms': 0.026},counters={'num_env_steps_sampled': 55250, 'num_env_steps_trained': 55000, 'num_agent_steps_sampled': 55250, 'num_agent_steps_trained': 55000, 'num_training_step_calls_since_last_synch_worker_weights': 1036, 'num_weight_broadcasts': 143, 'num_samples_added_to_queue': 55000},perf={'cpu_util_percent': 34.7, 'ram_util_percent': 5.24375} with parameters={'env_config': {'frameskip': 1, 'full_action_space': False, 'repeat_action_probability': 0.0}, 'rollout_fragment_length': 50, 'train_batch_size': 500, 'num_workers': 10, 'num_envs_per_worker': 5, 'clip_rewards': True, 'lr': 0.0005, 'num_gpus': 1, 'framework': 'torch', 'env': 'ALE/Breakout-v5'}. Trial IMPALA_ALE_Breakout-v5_d70d1_00000 reported custom_metrics={},episode_media={},info={'learner': {'default_policy': {'custom_metrics': {}, 'learner_stats': {'cur_lr': 0.0005, 'total_loss': -1.586639165878296, 'policy_loss': 1.0538454055786133, 'entropy': 0.7403178215026855, 'entropy_coeff': 0.01, 'var_gnorm': 11.36281967163086, 'vf_loss': 1.9741454124450684, 'vf_explained_var': 0.9754490852355957}, 'model': {}, 'num_grad_updates_lifetime': 170.0, 'diff_num_grad_updates_vs_sampler_policy': 11.5}}, 'num_env_steps_sampled': 85500, 'num_env_steps_trained': 85000, 'num_agent_steps_sampled': 85500, 'num_agent_steps_trained': 85000, 'num_training_step_calls_since_last_synch_worker_weights': 0, 'num_weight_broadcasts': 222, 'num_samples_added_to_queue': 85500, 'learner_queue': {'size_count': 171, 'size_mean': 0.0, 'size_std': 0.0, 'size_quantiles': [0.0, 0.0, 0.0, 0.0, 0.0]}, 'timing_breakdown': {'learner_grad_time_ms': 119.573, 'learner_load_time_ms': 4.85, 'learner_load_wait_time_ms': 60.447, 'learner_dequeue_time_ms': 2872.035}},sampler_results={'episode_reward_max': 9.0, 'episode_reward_min': 0.0, 'episode_reward_mean': 2.704697986577181, 'episode_len_mean': 860.6912751677852, 'episode_media': {}, 'episodes_this_iter': 149, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [0.0, 9.0, 0.0, 0.0, 9.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 0.0, 9.0, 9.0, 0.0, 9.0, 0.0, 9.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 9.0, 6.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 9.0, 9.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 9.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 1.0, 0.0, 9.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 9.0, 9.0, 9.0, 0.0, 9.0, 0.0, 0.0, 0.0], 'episode_lengths': [536, 1640, 524, 524, 1616, 531, 1624, 1628, 543, 540, 533, 540, 1616, 531, 517, 516, 519, 527, 540, 516, 525, 529, 541, 529, 536, 1624, 1624, 1620, 528, 528, 536, 1620, 536, 1628, 1616, 525, 1628, 543, 1628, 527, 532, 525, 1640, 520, 537, 515, 520, 1624, 1620, 532, 525, 521, 515, 1628, 1628, 523, 540, 539, 1632, 541, 528, 1640, 1624, 515, 1620, 1620, 516, 541, 519, 529, 517, 521, 520, 1616, 529, 543, 539, 515, 535, 515, 537, 1616, 1616, 517, 539, 521, 515, 523, 519, 528, 527, 533, 528, 536, 524, 540, 1620, 517, 527, 1632, 536, 1628, 1620, 525, 508, 537, 1628, 1644, 1636, 521, 536, 525, 1636, 527, 533, 525, 519, 528, 527, 1628, 519, 637, 536, 1640, 528, 529, 1644, 1644, 517, 520, 541, 1644, 527, 519, 527, 516, 529, 537, 1632, 1632, 528, 1616, 1640, 1628, 524, 1636, 533, 536, 511]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 3.0718346838518067, 'mean_inference_ms': 6.57333212573228, 'mean_action_processing_ms': 0.6328219398377659, 'mean_env_wait_ms': 7.789589344047068, 'mean_env_render_ms': 0.0}, 'num_faulty_episodes': 0, 'connector_metrics': {}},episode_reward_max=9.0,episode_reward_min=0.0,episode_reward_mean=2.704697986577181,episode_len_mean=860.6912751677852,episodes_this_iter=149,policy_reward_min={},policy_reward_max={},policy_reward_mean={},hist_stats={'episode_reward': [0.0, 9.0, 0.0, 0.0, 9.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 0.0, 9.0, 9.0, 0.0, 9.0, 0.0, 9.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 9.0, 6.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 9.0, 9.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 9.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 1.0, 0.0, 9.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 9.0, 9.0, 9.0, 0.0, 9.0, 0.0, 0.0, 0.0], 'episode_lengths': [536, 1640, 524, 524, 1616, 531, 1624, 1628, 543, 540, 533, 540, 1616, 531, 517, 516, 519, 527, 540, 516, 525, 529, 541, 529, 536, 1624, 1624, 1620, 528, 528, 536, 1620, 536, 1628, 1616, 525, 1628, 543, 1628, 527, 532, 525, 1640, 520, 537, 515, 520, 1624, 1620, 532, 525, 521, 515, 1628, 1628, 523, 540, 539, 1632, 541, 528, 1640, 1624, 515, 1620, 1620, 516, 541, 519, 529, 517, 521, 520, 1616, 529, 543, 539, 515, 535, 515, 537, 1616, 1616, 517, 539, 521, 515, 523, 519, 528, 527, 533, 528, 536, 524, 540, 1620, 517, 527, 1632, 536, 1628, 1620, 525, 508, 537, 1628, 1644, 1636, 521, 536, 525, 1636, 527, 533, 525, 519, 528, 527, 1628, 519, 637, 536, 1640, 528, 529, 1644, 1644, 517, 520, 541, 1644, 527, 519, 527, 516, 529, 537, 1632, 1632, 528, 1616, 1640, 1628, 524, 1636, 533, 536, 511]},sampler_perf={'mean_raw_obs_processing_ms': 3.0718346838518067, 'mean_inference_ms': 6.57333212573228, 'mean_action_processing_ms': 0.6328219398377659, 'mean_env_wait_ms': 7.789589344047068, 'mean_env_render_ms': 0.0},num_faulty_episodes=0,connector_metrics={},num_healthy_workers=10,num_in_flight_async_reqs=19,num_remote_worker_restarts=0,num_agent_steps_sampled=85500,num_agent_steps_trained=85000,num_env_steps_sampled=85500,num_env_steps_trained=85000,num_env_steps_sampled_this_iter=30250,num_env_steps_trained_this_iter=30000,num_steps_trained_this_iter=30000,agent_timesteps_total=85500,timers={'training_iteration_time_ms': 1.886, 'sample_time_ms': 0.337, 'synch_weights_time_ms': 1.013},counters={'num_env_steps_sampled': 85500, 'num_env_steps_trained': 85000, 'num_agent_steps_sampled': 85500, 'num_agent_steps_trained': 85000, 'num_training_step_calls_since_last_synch_worker_weights': 0, 'num_weight_broadcasts': 222, 'num_samples_added_to_queue': 85500},perf={'cpu_util_percent': 35.34375, 'ram_util_percent': 5.225} with parameters={'env_config': {'frameskip': 1, 'full_action_space': False, 'repeat_action_probability': 0.0}, 'rollout_fragment_length': 50, 'train_batch_size': 500, 'num_workers': 10, 'num_envs_per_worker': 5, 'clip_rewards': True, 'lr': 0.0005, 'num_gpus': 1, 'framework': 'torch', 'env': 'ALE/Breakout-v5'}. == Status == Current time: 2023-03-29 09:23:10 (running for 00:01:06.85) Using FIFO scheduling algorithm. Logical resource usage: 11.0/32 CPUs, 1.0/2 GPUs (0.0/1.0 accelerator_type:M60) Result logdir: /home/ray/ray_results/impala-breakoutnoframeskip-v5-torch Number of trials: 1/1 (1 RUNNING) +------------------------------------+----------+-------------------+-------------+--------+----------------+----------------+----------------+------------------+---------------+ | Trial name | status | loc | framework | iter | time_total_s | ts (sampled) | ts (trained) | train_episodes | reward_mean | |------------------------------------+----------+-------------------+-------------+--------+----------------+----------------+----------------+------------------+---------------| | IMPALA_ALE_Breakout-v5_d70d1_00000 | RUNNING | 10.0.2.182:127980 | torch | 3 | 34.3609 | 85500 | 85000 | 149 | 2.7047 | +------------------------------------+----------+-------------------+-------------+--------+----------------+----------------+----------------+------------------+---------------+
Trial IMPALA_ALE_Breakout-v5_d70d1_00000 reported custom_metrics={},episode_media={},info={'learner': {'default_policy': {'custom_metrics': {}, 'learner_stats': {'cur_lr': 0.0005, 'total_loss': 13.279269218444824, 'policy_loss': 14.329699516296387, 'entropy': 0.9800087213516235, 'entropy_coeff': 0.01, 'var_gnorm': 12.450754165649414, 'vf_loss': 7.5032267570495605, 'vf_explained_var': 0.7622262835502625}, 'model': {}, 'num_grad_updates_lifetime': 230.0, 'diff_num_grad_updates_vs_sampler_policy': 11.5}}, 'num_env_steps_sampled': 115250, 'num_env_steps_trained': 115000, 'num_agent_steps_sampled': 115250, 'num_agent_steps_trained': 115000, 'num_training_step_calls_since_last_synch_worker_weights': 755, 'num_weight_broadcasts': 301, 'num_samples_added_to_queue': 115000, 'learner_queue': {'size_count': 230, 'size_mean': 0.0, 'size_std': 0.0, 'size_quantiles': [0.0, 0.0, 0.0, 0.0, 0.0]}, 'timing_breakdown': {'learner_grad_time_ms': 124.085, 'learner_load_time_ms': 4.89, 'learner_load_wait_time_ms': 58.161, 'learner_dequeue_time_ms': 2890.702}},sampler_results={'episode_reward_max': 10.0, 'episode_reward_min': 0.0, 'episode_reward_mean': 1.6746987951807228, 'episode_len_mean': 777.6144578313254, 'episode_media': {}, 'episodes_this_iter': 166, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [4.0, 2.0, 0.0, 3.0, 2.0, 0.0, 2.0, 2.0, 0.0, 4.0, 0.0, 3.0, 2.0, 0.0, 2.0, 0.0, 0.0, 4.0, 3.0, 3.0, 4.0, 0.0, 1.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 1.0, 4.0, 0.0, 0.0, 2.0, 0.0, 2.0, 2.0, 2.0, 0.0, 1.0, 5.0, 0.0, 4.0, 0.0, 9.0, 4.0, 3.0, 0.0, 0.0, 2.0, 1.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 4.0, 5.0, 2.0, 0.0, 0.0, 1.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, 1.0, 0.0, 10.0, 1.0, 4.0, 5.0, 0.0, 0.0, 3.0, 0.0, 3.0, 2.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 6.0, 0.0, 1.0, 3.0, 2.0, 0.0, 2.0, 0.0, 2.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 3.0, 3.0, 3.0, 0.0, 4.0, 3.0, 2.0, 2.0, 0.0, 2.0, 2.0, 2.0, 1.0, 0.0, 0.0, 0.0, 2.0, 1.0, 6.0, 9.0, 0.0, 6.0, 3.0, 3.0, 0.0, 2.0, 1.0, 2.0, 2.0, 0.0, 3.0, 2.0, 0.0, 0.0, 3.0, 2.0, 10.0, 1.0, 0.0, 0.0, 4.0, 2.0, 2.0, 2.0, 3.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 4.0], 'episode_lengths': [1214, 834, 516, 1022, 842, 492, 811, 806, 520, 1138, 513, 931, 822, 517, 812, 515, 525, 1202, 1026, 1014, 1202, 504, 636, 509, 809, 795, 516, 507, 512, 519, 810, 537, 810, 711, 1218, 526, 517, 830, 509, 835, 833, 829, 529, 648, 1317, 512, 1212, 532, 1624, 1207, 939, 543, 542, 830, 635, 508, 628, 516, 817, 507, 520, 1111, 1312, 795, 543, 529, 623, 720, 631, 809, 505, 532, 517, 503, 1144, 504, 520, 1300, 1296, 525, 503, 705, 507, 1730, 704, 1218, 1422, 514, 533, 872, 519, 935, 805, 520, 513, 830, 509, 718, 507, 1610, 533, 641, 1015, 814, 522, 839, 517, 817, 523, 620, 523, 812, 529, 524, 962, 995, 1004, 524, 1210, 1010, 806, 838, 517, 802, 790, 799, 697, 531, 536, 511, 890, 621, 1525, 1620, 520, 1611, 1018, 1026, 522, 761, 645, 804, 895, 515, 933, 814, 508, 535, 1000, 811, 1822, 697, 507, 512, 1230, 818, 819, 796, 907, 712, 532, 908, 531, 523, 525, 1205]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 3.094785060279142, 'mean_inference_ms': 6.571849059434162, 'mean_action_processing_ms': 0.6336455005969693, 'mean_env_wait_ms': 7.782669379977965, 'mean_env_render_ms': 0.0}, 'num_faulty_episodes': 0, 'connector_metrics': {}},episode_reward_max=10.0,episode_reward_min=0.0,episode_reward_mean=1.6746987951807228,episode_len_mean=777.6144578313254,episodes_this_iter=166,policy_reward_min={},policy_reward_max={},policy_reward_mean={},hist_stats={'episode_reward': [4.0, 2.0, 0.0, 3.0, 2.0, 0.0, 2.0, 2.0, 0.0, 4.0, 0.0, 3.0, 2.0, 0.0, 2.0, 0.0, 0.0, 4.0, 3.0, 3.0, 4.0, 0.0, 1.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 1.0, 4.0, 0.0, 0.0, 2.0, 0.0, 2.0, 2.0, 2.0, 0.0, 1.0, 5.0, 0.0, 4.0, 0.0, 9.0, 4.0, 3.0, 0.0, 0.0, 2.0, 1.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 4.0, 5.0, 2.0, 0.0, 0.0, 1.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, 1.0, 0.0, 10.0, 1.0, 4.0, 5.0, 0.0, 0.0, 3.0, 0.0, 3.0, 2.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 6.0, 0.0, 1.0, 3.0, 2.0, 0.0, 2.0, 0.0, 2.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 3.0, 3.0, 3.0, 0.0, 4.0, 3.0, 2.0, 2.0, 0.0, 2.0, 2.0, 2.0, 1.0, 0.0, 0.0, 0.0, 2.0, 1.0, 6.0, 9.0, 0.0, 6.0, 3.0, 3.0, 0.0, 2.0, 1.0, 2.0, 2.0, 0.0, 3.0, 2.0, 0.0, 0.0, 3.0, 2.0, 10.0, 1.0, 0.0, 0.0, 4.0, 2.0, 2.0, 2.0, 3.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 4.0], 'episode_lengths': [1214, 834, 516, 1022, 842, 492, 811, 806, 520, 1138, 513, 931, 822, 517, 812, 515, 525, 1202, 1026, 1014, 1202, 504, 636, 509, 809, 795, 516, 507, 512, 519, 810, 537, 810, 711, 1218, 526, 517, 830, 509, 835, 833, 829, 529, 648, 1317, 512, 1212, 532, 1624, 1207, 939, 543, 542, 830, 635, 508, 628, 516, 817, 507, 520, 1111, 1312, 795, 543, 529, 623, 720, 631, 809, 505, 532, 517, 503, 1144, 504, 520, 1300, 1296, 525, 503, 705, 507, 1730, 704, 1218, 1422, 514, 533, 872, 519, 935, 805, 520, 513, 830, 509, 718, 507, 1610, 533, 641, 1015, 814, 522, 839, 517, 817, 523, 620, 523, 812, 529, 524, 962, 995, 1004, 524, 1210, 1010, 806, 838, 517, 802, 790, 799, 697, 531, 536, 511, 890, 621, 1525, 1620, 520, 1611, 1018, 1026, 522, 761, 645, 804, 895, 515, 933, 814, 508, 535, 1000, 811, 1822, 697, 507, 512, 1230, 818, 819, 796, 907, 712, 532, 908, 531, 523, 525, 1205]},sampler_perf={'mean_raw_obs_processing_ms': 3.094785060279142, 'mean_inference_ms': 6.571849059434162, 'mean_action_processing_ms': 0.6336455005969693, 'mean_env_wait_ms': 7.782669379977965, 'mean_env_render_ms': 0.0},num_faulty_episodes=0,connector_metrics={},num_healthy_workers=10,num_in_flight_async_reqs=20,num_remote_worker_restarts=0,num_agent_steps_sampled=115250,num_agent_steps_trained=115000,num_env_steps_sampled=115250,num_env_steps_trained=115000,num_env_steps_sampled_this_iter=29750,num_env_steps_trained_this_iter=30000,num_steps_trained_this_iter=30000,agent_timesteps_total=115250,timers={'training_iteration_time_ms': 0.367, 'sample_time_ms': 0.255, 'synch_weights_time_ms': 0.028},counters={'num_env_steps_sampled': 115250, 'num_env_steps_trained': 115000, 'num_agent_steps_sampled': 115250, 'num_agent_steps_trained': 115000, 'num_training_step_calls_since_last_synch_worker_weights': 755, 'num_weight_broadcasts': 301, 'num_samples_added_to_queue': 115000},perf={'cpu_util_percent': 34.54117647058823, 'ram_util_percent': 5.2294117647058815} with parameters={'env_config': {'frameskip': 1, 'full_action_space': False, 'repeat_action_probability': 0.0}, 'rollout_fragment_length': 50, 'train_batch_size': 500, 'num_workers': 10, 'num_envs_per_worker': 5, 'clip_rewards': True, 'lr': 0.0005, 'num_gpus': 1, 'framework': 'torch', 'env': 'ALE/Breakout-v5'}.