[AIR output] UX issues of new AIR output for learning_tests_impala_torch (AIR_VERBOSITY=0, 1, 2)

scottsun94 commented 1 year ago

What happened + What you expected to happen

It seems that we do show "iteration". Not sure if it's a good thing or not because users may not be familiar with this "iteration" concept.
We should ask RLlib team to choose fewer metrics for default output (not sure if it should be customized per algorithm)

Format of the output should be updated as mentioned in other related issues

Training finished iter 13 at 2023-03-29 09:08:13 (running for 00:03:11.52)
agent_timesteps_total: 389000
connector_metrics: {}
counters:
num_agent_steps_sampled: 389000
num_agent_steps_trained: 388500
num_env_steps_sampled: 389000
num_env_steps_trained: 388500
num_samples_added_to_queue: 389000
num_training_step_calls_since_last_synch_worker_weights: 134
num_weight_broadcasts: 964
custom_metrics: {}
episode_len_mean: 1721.88
episode_media: {}
episode_reward_max: 36.0
episode_reward_mean: 9.77
episode_reward_min: 4.0
episodes_this_iter: 67
episodes_total: 1710
info:
learner:
default_policy:
  custom_metrics: {}
  diff_num_grad_updates_vs_sampler_policy: 10.0
  learner_stats:
    cur_lr: 0.0005
    entropy: 1.0906739234924316
    entropy_coeff: 0.01
    policy_loss: -32.83232116699219
    total_loss: -28.335006713867188
    var_gnorm: 16.424108505249023
    vf_explained_var: 0.6170328259468079
    vf_loss: 19.683231353759766
  model: {}
  num_grad_updates_lifetime: 777.0
learner_queue:
size_count: 778
size_mean: 0.0
size_quantiles: [0.0, 0.0, 0.0, 0.0, 0.0]
size_std: 0.0
num_agent_steps_sampled: 389000
num_agent_steps_trained: 388500
num_env_steps_sampled: 389000
num_env_steps_trained: 388500
num_samples_added_to_queue: 389000
num_training_step_calls_since_last_synch_worker_weights: 134
num_weight_broadcasts: 964
timing_breakdown:
learner_dequeue_time_ms: 2772.957
learner_grad_time_ms: 123.634
learner_load_time_ms: 4.319
learner_load_wait_time_ms: 47.829
num_agent_steps_sampled: 389000
num_agent_steps_trained: 388500
num_env_steps_sampled: 389000
num_env_steps_sampled_this_iter: 30750
num_env_steps_trained: 388500
num_env_steps_trained_this_iter: 31000
num_faulty_episodes: 0
num_healthy_workers: 10
num_in_flight_async_reqs: 20
num_remote_worker_restarts: 0
num_steps_trained_this_iter: 31000
perf:
cpu_util_percent: 34.94117647058823
ram_util_percent: 5.211764705882353
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.6395067034460499
mean_env_render_ms: 0.0
mean_env_wait_ms: 7.840870172264184
mean_inference_ms: 6.614064184668028
mean_raw_obs_processing_ms: 2.9528597540097277
sampler_results:
connector_metrics: {}
custom_metrics: {}
episode_len_mean: 1721.88
episode_media: {}
episode_reward_max: 36.0
episode_reward_mean: 9.77
episode_reward_min: 4.0
episodes_this_iter: 67
hist_stats:
episode_lengths: [1414, 1306, 1641, 1446, 1234, 2026, 1600, 2454, 1359, 1572,
  1411, 1471, 1463, 1269, 1347, 1083, 2344, 1095, 1956, 1603, 1255, 2218, 1208,
  1943, 1483, 1158, 2108, 1073, 1535, 2590, 1804, 1802, 2109, 1783, 1099, 1258,
  1211, 1826, 2480, 1977, 1649, 1159, 1598, 1972, 2280, 2026, 1732, 1167, 1884,
  1599, 1722, 2156, 1723, 1767, 1387, 1849, 2061, 2356, 1875, 1727, 2524, 1620,
  1926, 1507, 1902, 1999, 1914, 1514, 1699, 1095, 2081, 1632, 1520, 1578, 2329,
  985, 1681, 1719, 1836, 1306, 2122, 1726, 1804, 2020, 2076, 1235, 1074, 1970,
  1853, 1836, 1228, 1431, 2112, 1946, 2793, 1822, 2044, 1946, 2200, 1880]
episode_reward: [9.0, 12.0, 7.0, 7.0, 5.0, 11.0, 7.0, 17.0, 10.0, 11.0, 9.0, 6.0,
  6.0, 6.0, 9.0, 4.0, 13.0, 11.0, 12.0, 7.0, 5.0, 10.0, 5.0, 10.0, 14.0, 4.0,
  10.0, 4.0, 10.0, 15.0, 7.0, 11.0, 9.0, 8.0, 11.0, 5.0, 8.0, 16.0, 13.0, 8.0,
  6.0, 12.0, 6.0, 9.0, 11.0, 13.0, 7.0, 4.0, 11.0, 7.0, 10.0, 15.0, 7.0, 15.0,
  6.0, 8.0, 10.0, 36.0, 8.0, 8.0, 14.0, 9.0, 11.0, 13.0, 15.0, 9.0, 8.0, 5.0,
  8.0, 11.0, 14.0, 11.0, 6.0, 14.0, 20.0, 4.0, 11.0, 8.0, 8.0, 12.0, 14.0, 10.0,
  10.0, 11.0, 10.0, 4.0, 4.0, 9.0, 8.0, 8.0, 4.0, 5.0, 10.0, 11.0, 20.0, 13.0,
  14.0, 8.0, 13.0, 9.0]
num_faulty_episodes: 0
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.6395067034460499
mean_env_render_ms: 0.0
mean_env_wait_ms: 7.840870172264184
mean_inference_ms: 6.614064184668028
mean_raw_obs_processing_ms: 2.9528597540097277
time_this_iter_s: 11.578344583511353
time_total_s: 149.75555968284607
timers:
sample_time_ms: 0.242
synch_weights_time_ms: 0.027
training_iteration_time_ms: 0.354
timesteps_total: 389000
training_iteration: 13

After I changed the AIR_VERBOSITY to 0, I started to see the old output flow.


(base) ray@ip-10-0-2-182:~/default$ export AIR_VERBOSITY=0
(base) ray@ip-10-0-2-182:~/default$ python learning_tests/run.py --yaml-sub-dir=impala --framework=torch
WARNING:ray.rllib.utils.framework:Not importing JAX for test purposes.
WARNING:ray.rllib.utils.framework:Not importing JAX for test purposes.
abs_yaml_path=learning_tests/yaml_files/impala
Will run the following yaml files:
-> /home/ray/default/learning_tests/yaml_files/impala/impala-breakoutnoframeskip-v5.yaml
2023-03-29 09:22:03,164 INFO worker.py:1415 -- Connecting to existing Ray cluster at address: 10.0.2.182:6379...
2023-03-29 09:22:03,176 INFO worker.py:1609 -- Connected to Ray cluster. View the dashboard at https://console.anyscale-staging.com/api/v2/sessions/ses_wvy6cr3u1lcu2bj2qtuhk853v2/services?redirect_to=dashboard 
2023-03-29 09:22:03,179 INFO packaging.py:346 -- Pushing file package 'gcs://_ray_pkg_ca40b9ada33c9d0e8f58332424084e2a.zip' (0.03MiB) to Ray cluster...
2023-03-29 09:22:03,180 INFO packaging.py:359 -- Successfully pushed file package 'gcs://_ray_pkg_ca40b9ada33c9d0e8f58332424084e2a.zip'.
Starting learning test iteration 0...
== Test config ==
impala-breakoutnoframeskip-v5-torch:
config:
clip_rewards: true
env_config:
  frameskip: 1
  full_action_space: false
  repeat_action_probability: 0.0
framework: torch
lr: 0.0005
num_envs_per_worker: 5
num_gpus: 1
num_workers: 10
rollout_fragment_length: 50
train_batch_size: 500
env: ALE/Breakout-v5
run: IMPALA
stop:
time_total_s: 2400

/home/ray/anaconda3/lib/python3.7/site-packages/ray/tune/progress_reporter.py:351: UserWarning: Both 'metric' and 'mode' must be set to be able to sort by metric. No sorting is performed. "Both 'metric' and 'mode' must be set to be able " (pid=127980) 2023-03-29 09:22:12,160 WARNING framework.py:28 -- Not importing JAX for test purposes. (pid=127980) 2023-03-29 09:22:12,483 WARNING framework.py:28 -- Not importing JAX for test purposes. (Impala pid=127980) 2023-03-29 09:22:12,913 WARNING algorithm_config.py:637 -- Cannot create ImpalaConfig from given config_dict! Property __stdout_file__ not supported. (Impala pid=127980) A.L.E: Arcade Learning Environment (version 0.8.0+919230b) (Impala pid=127980) [Powered by Stella] (Impala pid=127980) 2023-03-29 09:22:13,519 INFO algorithm.py:528 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags. (pid=128252) 2023-03-29 09:22:20,796 WARNING framework.py:28 -- Not importing JAX for test purposes. (pid=128253) 2023-03-29 09:22:20,781 WARNING framework.py:28 -- Not importing JAX for test purposes. (RolloutWorker pid=128252) A.L.E: Arcade Learning Environment (version 0.8.0+919230b) (RolloutWorker pid=128252) [Powered by Stella] (RolloutWorker pid=128250) 2023-03-29 09:22:21,816 WARNING deprecation.py:51 -- DeprecationWarning: FrameStack has been deprecated. This will raise an error in the future! == Status == Current time: 2023-03-29 09:22:05 (running for 00:00:02.06) Using FIFO scheduling algorithm. Logical resource usage: 11.0/32 CPUs, 1.0/2 GPUs (0.0/1.0 accelerator_type:M60) Result logdir: /home/ray/ray_results/impala-breakoutnoframeskip-v5-torch Number of trials: 1/1 (1 RUNNING) +------------------------------------+----------+-------------------+-------------+ | Trial name | status | loc | framework | |------------------------------------+----------+-------------------+-------------| | IMPALA_ALE_Breakout-v5_d70d1_00000 | RUNNING | 10.0.2.182:127980 | torch | +------------------------------------+----------+-------------------+-------------+

(Impala pid=127980) 2023-03-29 09:22:25,927 INFO trainable.py:176 -- Trainable.setup took 12.410 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads. (Impala pid=127980) 2023-03-29 09:22:25,928 WARNING util.py:67 -- Install gputil for GPU system monitoring. (pid=128258) 2023-03-29 09:22:21,337 WARNING framework.py:28 -- Not importing JAX for test purposes. [repeated 18x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication) == Status == Current time: 2023-03-29 09:22:35 (running for 00:00:32.37) Using FIFO scheduling algorithm. Logical resource usage: 11.0/32 CPUs, 1.0/2 GPUs (0.0/1.0 accelerator_type:M60) Result logdir: /home/ray/ray_results/impala-breakoutnoframeskip-v5-torch Number of trials: 1/1 (1 RUNNING) +------------------------------------+----------+-------------------+-------------+ | Trial name | status | loc | framework | |------------------------------------+----------+-------------------+-------------| | IMPALA_ALE_Breakout-v5_d70d1_00000 | RUNNING | 10.0.2.182:127980 | torch | +------------------------------------+----------+-------------------+-------------+

Trial IMPALA_ALE_Breakout-v5_d70d1_00000 reported custom_metrics={},episode_media={},info={'learner': {'default_policy': {'custom_metrics': {}, 'learner_stats': {'cur_lr': 0.0005, 'total_loss': 27.16767120361328, 'policy_loss': -2.7041698235955783e-25, 'entropy': 3.1160257634342076e-17, 'entropy_coeff': 0.01, 'var_gnorm': 9.948323249816895, 'vf_loss': 54.33534240722656, 'vf_explained_var': 0.025636136531829834}, 'model': {}, 'num_grad_updates_lifetime': 50.0, 'diff_num_grad_updates_vs_sampler_policy': 12.0}}, 'num_env_steps_sampled': 25250, 'num_env_steps_trained': 25000, 'num_agent_steps_sampled': 25250, 'num_agent_steps_trained': 25000, 'num_training_step_calls_since_last_synch_worker_weights': 79, 'num_weight_broadcasts': 79, 'num_samples_added_to_queue': 25000, 'learner_queue': {'size_count': 50, 'size_mean': 0.0, 'size_std': 0.0, 'size_quantiles': [0.0, 0.0, 0.0, 0.0, 0.0]}, 'timing_breakdown': {'learner_grad_time_ms': 128.81, 'learner_load_time_ms': 5.331, 'learner_load_wait_time_ms': 53.762, 'learner_dequeue_time_ms': 2284.748}},sampler_results={'episode_reward_max': 6.0, 'episode_reward_min': 0.0, 'episode_reward_mean': 1.2105263157894737, 'episode_len_mean': 717.4868421052631, 'episode_media': {}, 'episodes_this_iter': 152, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [0.0, 0.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 0.0, 2.0, 2.0, 2.0, 3.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 5.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, 4.0, 4.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 4.0, 2.0, 2.0, 2.0, 2.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], 'episode_lengths': [516, 501, 806, 809, 801, 521, 523, 539, 516, 543, 525, 521, 517, 543, 531, 527, 532, 1406, 524, 809, 817, 800, 813, 817, 541, 531, 1414, 1430, 1414, 1406, 516, 791, 798, 805, 814, 819, 537, 536, 525, 517, 519, 1426, 1406, 516, 814, 812, 796, 988, 1016, 515, 536, 540, 529, 527, 1206, 531, 523, 1422, 540, 816, 800, 803, 804, 818, 535, 516, 543, 529, 543, 528, 1430, 531, 519, 1546, 794, 817, 790, 814, 816, 528, 528, 515, 537, 528, 1218, 519, 1228, 1224, 1220, 649, 790, 814, 819, 818, 813, 532, 516, 531, 515, 529, 516, 529, 1410, 529, 532, 800, 793, 791, 817, 790, 537, 540, 520, 529, 520, 527, 532, 1214, 515, 528, 515, 1212, 818, 813, 812, 814, 992, 528, 516, 528, 521, 528, 541, 535, 532, 540, 791, 796, 814, 794, 816, 521, 535, 539, 536, 532, 541, 527, 537, 531, 525, 533]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 3.1451655233545353, 'mean_inference_ms': 6.558934767208633, 'mean_action_processing_ms': 0.6327834007776333, 'mean_env_wait_ms': 7.896650866089233, 'mean_env_render_ms': 0.0}, 'num_faulty_episodes': 0, 'connector_metrics': {}},episode_reward_max=6.0,episode_reward_min=0.0,episode_reward_mean=1.2105263157894737,episode_len_mean=717.4868421052631,episodes_this_iter=152,policy_reward_min={},policy_reward_max={},policy_reward_mean={},hist_stats={'episode_reward': [0.0, 0.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 0.0, 2.0, 2.0, 2.0, 3.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 5.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, 4.0, 4.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 4.0, 2.0, 2.0, 2.0, 2.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], 'episode_lengths': [516, 501, 806, 809, 801, 521, 523, 539, 516, 543, 525, 521, 517, 543, 531, 527, 532, 1406, 524, 809, 817, 800, 813, 817, 541, 531, 1414, 1430, 1414, 1406, 516, 791, 798, 805, 814, 819, 537, 536, 525, 517, 519, 1426, 1406, 516, 814, 812, 796, 988, 1016, 515, 536, 540, 529, 527, 1206, 531, 523, 1422, 540, 816, 800, 803, 804, 818, 535, 516, 543, 529, 543, 528, 1430, 531, 519, 1546, 794, 817, 790, 814, 816, 528, 528, 515, 537, 528, 1218, 519, 1228, 1224, 1220, 649, 790, 814, 819, 818, 813, 532, 516, 531, 515, 529, 516, 529, 1410, 529, 532, 800, 793, 791, 817, 790, 537, 540, 520, 529, 520, 527, 532, 1214, 515, 528, 515, 1212, 818, 813, 812, 814, 992, 528, 516, 528, 521, 528, 541, 535, 532, 540, 791, 796, 814, 794, 816, 521, 535, 539, 536, 532, 541, 527, 537, 531, 525, 533]},sampler_perf={'mean_raw_obs_processing_ms': 3.1451655233545353, 'mean_inference_ms': 6.558934767208633, 'mean_action_processing_ms': 0.6327834007776333, 'mean_env_wait_ms': 7.896650866089233, 'mean_env_render_ms': 0.0},num_faulty_episodes=0,connector_metrics={},num_healthy_workers=10,num_in_flight_async_reqs=20,num_remote_worker_restarts=0,num_agent_steps_sampled=25250,num_agent_steps_trained=25000,num_env_steps_sampled=25250,num_env_steps_trained=25000,num_env_steps_sampled_this_iter=25250,num_env_steps_trained_this_iter=25000,num_steps_trained_this_iter=25000,agent_timesteps_total=25250,timers={'training_iteration_time_ms': 0.348, 'sample_time_ms': 0.239, 'synch_weights_time_ms': 0.026},counters={'num_env_steps_sampled': 25250, 'num_env_steps_trained': 25000, 'num_agent_steps_sampled': 25250, 'num_agent_steps_trained': 25000, 'num_training_step_calls_since_last_synch_worker_weights': 79, 'num_weight_broadcasts': 79, 'num_samples_added_to_queue': 25000},perf={'cpu_util_percent': 35.023529411764706, 'ram_util_percent': 5.2} with parameters={'env_config': {'frameskip': 1, 'full_action_space': False, 'repeat_action_probability': 0.0}, 'rollout_fragment_length': 50, 'train_batch_size': 500, 'num_workers': 10, 'num_envs_per_worker': 5, 'clip_rewards': True, 'lr': 0.0005, 'num_gpus': 1, 'framework': 'torch', 'env': 'ALE/Breakout-v5'}. Trial IMPALA_ALE_Breakout-v5_d70d1_00000 reported custom_metrics={},episode_media={},info={'learner': {'default_policy': {'custom_metrics': {}, 'learner_stats': {'cur_lr': 0.0005, 'total_loss': 11.154390335083008, 'policy_loss': 0.0, 'entropy': 3.126514513951406e-08, 'entropy_coeff': 0.01, 'var_gnorm': 10.71700382232666, 'vf_loss': 22.308780670166016, 'vf_explained_var': 0.7661240696907043}, 'model': {}, 'num_grad_updates_lifetime': 110.0, 'diff_num_grad_updates_vs_sampler_policy': 13.0}}, 'num_env_steps_sampled': 55250, 'num_env_steps_trained': 55000, 'num_agent_steps_sampled': 55250, 'num_agent_steps_trained': 55000, 'num_training_step_calls_since_last_synch_worker_weights': 1036, 'num_weight_broadcasts': 143, 'num_samples_added_to_queue': 55000, 'learner_queue': {'size_count': 110, 'size_mean': 0.0, 'size_std': 0.0, 'size_quantiles': [0.0, 0.0, 0.0, 0.0, 0.0]}, 'timing_breakdown': {'learner_grad_time_ms': 125.55, 'learner_load_time_ms': 5.345, 'learner_load_wait_time_ms': 55.379, 'learner_dequeue_time_ms': 2533.677}},sampler_results={'episode_reward_max': 9.0, 'episode_reward_min': 0.0, 'episode_reward_mean': 2.3223684210526314, 'episode_len_mean': 835.8684210526316, 'episode_media': {}, 'episodes_this_iter': 152, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [0.0, 0.0, 0.0, 5.0, 0.0, 9.0, 9.0, 9.0, 9.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 9.0, 0.0, 9.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 9.0, 9.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 0.0, 5.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 5.0, 0.0, 0.0, 5.0, 5.0, 0.0, 0.0, 0.0, 9.0, 9.0, 9.0, 0.0, 0.0], 'episode_lengths': [540, 528, 515, 1422, 535, 1616, 1624, 1640, 1636, 536, 1640, 531, 535, 519, 528, 533, 520, 537, 540, 536, 540, 539, 1640, 519, 541, 520, 537, 524, 516, 1628, 1636, 535, 1410, 536, 539, 533, 528, 537, 1406, 521, 516, 516, 540, 529, 517, 528, 523, 516, 1620, 1636, 1224, 529, 527, 527, 515, 519, 1628, 1624, 1628, 519, 540, 520, 517, 543, 535, 520, 1308, 531, 515, 516, 1632, 1616, 536, 529, 1632, 1636, 516, 537, 517, 527, 535, 519, 524, 541, 515, 536, 520, 1644, 527, 539, 1616, 527, 1628, 541, 528, 1406, 525, 517, 529, 1426, 521, 520, 523, 516, 521, 528, 529, 1644, 1640, 515, 537, 524, 524, 517, 769, 531, 519, 1624, 1640, 531, 1644, 1620, 516, 516, 536, 536, 1426, 1418, 523, 1430, 531, 520, 535, 1620, 1644, 531, 543, 515, 1636, 1426, 529, 528, 1414, 1426, 533, 532, 541, 1644, 1620, 1640, 523, 540]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 3.0960928772947756, 'mean_inference_ms': 6.56534965209043, 'mean_action_processing_ms': 0.631634179682276, 'mean_env_wait_ms': 7.807842136017302, 'mean_env_render_ms': 0.0}, 'num_faulty_episodes': 0, 'connector_metrics': {}},episode_reward_max=9.0,episode_reward_min=0.0,episode_reward_mean=2.3223684210526314,episode_len_mean=835.8684210526316,episodes_this_iter=152,policy_reward_min={},policy_reward_max={},policy_reward_mean={},hist_stats={'episode_reward': [0.0, 0.0, 0.0, 5.0, 0.0, 9.0, 9.0, 9.0, 9.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 9.0, 0.0, 9.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 9.0, 9.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 0.0, 5.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 5.0, 0.0, 0.0, 5.0, 5.0, 0.0, 0.0, 0.0, 9.0, 9.0, 9.0, 0.0, 0.0], 'episode_lengths': [540, 528, 515, 1422, 535, 1616, 1624, 1640, 1636, 536, 1640, 531, 535, 519, 528, 533, 520, 537, 540, 536, 540, 539, 1640, 519, 541, 520, 537, 524, 516, 1628, 1636, 535, 1410, 536, 539, 533, 528, 537, 1406, 521, 516, 516, 540, 529, 517, 528, 523, 516, 1620, 1636, 1224, 529, 527, 527, 515, 519, 1628, 1624, 1628, 519, 540, 520, 517, 543, 535, 520, 1308, 531, 515, 516, 1632, 1616, 536, 529, 1632, 1636, 516, 537, 517, 527, 535, 519, 524, 541, 515, 536, 520, 1644, 527, 539, 1616, 527, 1628, 541, 528, 1406, 525, 517, 529, 1426, 521, 520, 523, 516, 521, 528, 529, 1644, 1640, 515, 537, 524, 524, 517, 769, 531, 519, 1624, 1640, 531, 1644, 1620, 516, 516, 536, 536, 1426, 1418, 523, 1430, 531, 520, 535, 1620, 1644, 531, 543, 515, 1636, 1426, 529, 528, 1414, 1426, 533, 532, 541, 1644, 1620, 1640, 523, 540]},sampler_perf={'mean_raw_obs_processing_ms': 3.0960928772947756, 'mean_inference_ms': 6.56534965209043, 'mean_action_processing_ms': 0.631634179682276, 'mean_env_wait_ms': 7.807842136017302, 'mean_env_render_ms': 0.0},num_faulty_episodes=0,connector_metrics={},num_healthy_workers=10,num_in_flight_async_reqs=20,num_remote_worker_restarts=0,num_agent_steps_sampled=55250,num_agent_steps_trained=55000,num_env_steps_sampled=55250,num_env_steps_trained=55000,num_env_steps_sampled_this_iter=30000,num_env_steps_trained_this_iter=30000,num_steps_trained_this_iter=30000,agent_timesteps_total=55250,timers={'training_iteration_time_ms': 0.347, 'sample_time_ms': 0.235, 'synch_weights_time_ms': 0.026},counters={'num_env_steps_sampled': 55250, 'num_env_steps_trained': 55000, 'num_agent_steps_sampled': 55250, 'num_agent_steps_trained': 55000, 'num_training_step_calls_since_last_synch_worker_weights': 1036, 'num_weight_broadcasts': 143, 'num_samples_added_to_queue': 55000},perf={'cpu_util_percent': 34.7, 'ram_util_percent': 5.24375} with parameters={'env_config': {'frameskip': 1, 'full_action_space': False, 'repeat_action_probability': 0.0}, 'rollout_fragment_length': 50, 'train_batch_size': 500, 'num_workers': 10, 'num_envs_per_worker': 5, 'clip_rewards': True, 'lr': 0.0005, 'num_gpus': 1, 'framework': 'torch', 'env': 'ALE/Breakout-v5'}. Trial IMPALA_ALE_Breakout-v5_d70d1_00000 reported custom_metrics={},episode_media={},info={'learner': {'default_policy': {'custom_metrics': {}, 'learner_stats': {'cur_lr': 0.0005, 'total_loss': -1.586639165878296, 'policy_loss': 1.0538454055786133, 'entropy': 0.7403178215026855, 'entropy_coeff': 0.01, 'var_gnorm': 11.36281967163086, 'vf_loss': 1.9741454124450684, 'vf_explained_var': 0.9754490852355957}, 'model': {}, 'num_grad_updates_lifetime': 170.0, 'diff_num_grad_updates_vs_sampler_policy': 11.5}}, 'num_env_steps_sampled': 85500, 'num_env_steps_trained': 85000, 'num_agent_steps_sampled': 85500, 'num_agent_steps_trained': 85000, 'num_training_step_calls_since_last_synch_worker_weights': 0, 'num_weight_broadcasts': 222, 'num_samples_added_to_queue': 85500, 'learner_queue': {'size_count': 171, 'size_mean': 0.0, 'size_std': 0.0, 'size_quantiles': [0.0, 0.0, 0.0, 0.0, 0.0]}, 'timing_breakdown': {'learner_grad_time_ms': 119.573, 'learner_load_time_ms': 4.85, 'learner_load_wait_time_ms': 60.447, 'learner_dequeue_time_ms': 2872.035}},sampler_results={'episode_reward_max': 9.0, 'episode_reward_min': 0.0, 'episode_reward_mean': 2.704697986577181, 'episode_len_mean': 860.6912751677852, 'episode_media': {}, 'episodes_this_iter': 149, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [0.0, 9.0, 0.0, 0.0, 9.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 0.0, 9.0, 9.0, 0.0, 9.0, 0.0, 9.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 9.0, 6.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 9.0, 9.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 9.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 1.0, 0.0, 9.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 9.0, 9.0, 9.0, 0.0, 9.0, 0.0, 0.0, 0.0], 'episode_lengths': [536, 1640, 524, 524, 1616, 531, 1624, 1628, 543, 540, 533, 540, 1616, 531, 517, 516, 519, 527, 540, 516, 525, 529, 541, 529, 536, 1624, 1624, 1620, 528, 528, 536, 1620, 536, 1628, 1616, 525, 1628, 543, 1628, 527, 532, 525, 1640, 520, 537, 515, 520, 1624, 1620, 532, 525, 521, 515, 1628, 1628, 523, 540, 539, 1632, 541, 528, 1640, 1624, 515, 1620, 1620, 516, 541, 519, 529, 517, 521, 520, 1616, 529, 543, 539, 515, 535, 515, 537, 1616, 1616, 517, 539, 521, 515, 523, 519, 528, 527, 533, 528, 536, 524, 540, 1620, 517, 527, 1632, 536, 1628, 1620, 525, 508, 537, 1628, 1644, 1636, 521, 536, 525, 1636, 527, 533, 525, 519, 528, 527, 1628, 519, 637, 536, 1640, 528, 529, 1644, 1644, 517, 520, 541, 1644, 527, 519, 527, 516, 529, 537, 1632, 1632, 528, 1616, 1640, 1628, 524, 1636, 533, 536, 511]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 3.0718346838518067, 'mean_inference_ms': 6.57333212573228, 'mean_action_processing_ms': 0.6328219398377659, 'mean_env_wait_ms': 7.789589344047068, 'mean_env_render_ms': 0.0}, 'num_faulty_episodes': 0, 'connector_metrics': {}},episode_reward_max=9.0,episode_reward_min=0.0,episode_reward_mean=2.704697986577181,episode_len_mean=860.6912751677852,episodes_this_iter=149,policy_reward_min={},policy_reward_max={},policy_reward_mean={},hist_stats={'episode_reward': [0.0, 9.0, 0.0, 0.0, 9.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 0.0, 9.0, 9.0, 0.0, 9.0, 0.0, 9.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 9.0, 6.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 9.0, 9.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 9.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 1.0, 0.0, 9.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 9.0, 9.0, 9.0, 0.0, 9.0, 0.0, 0.0, 0.0], 'episode_lengths': [536, 1640, 524, 524, 1616, 531, 1624, 1628, 543, 540, 533, 540, 1616, 531, 517, 516, 519, 527, 540, 516, 525, 529, 541, 529, 536, 1624, 1624, 1620, 528, 528, 536, 1620, 536, 1628, 1616, 525, 1628, 543, 1628, 527, 532, 525, 1640, 520, 537, 515, 520, 1624, 1620, 532, 525, 521, 515, 1628, 1628, 523, 540, 539, 1632, 541, 528, 1640, 1624, 515, 1620, 1620, 516, 541, 519, 529, 517, 521, 520, 1616, 529, 543, 539, 515, 535, 515, 537, 1616, 1616, 517, 539, 521, 515, 523, 519, 528, 527, 533, 528, 536, 524, 540, 1620, 517, 527, 1632, 536, 1628, 1620, 525, 508, 537, 1628, 1644, 1636, 521, 536, 525, 1636, 527, 533, 525, 519, 528, 527, 1628, 519, 637, 536, 1640, 528, 529, 1644, 1644, 517, 520, 541, 1644, 527, 519, 527, 516, 529, 537, 1632, 1632, 528, 1616, 1640, 1628, 524, 1636, 533, 536, 511]},sampler_perf={'mean_raw_obs_processing_ms': 3.0718346838518067, 'mean_inference_ms': 6.57333212573228, 'mean_action_processing_ms': 0.6328219398377659, 'mean_env_wait_ms': 7.789589344047068, 'mean_env_render_ms': 0.0},num_faulty_episodes=0,connector_metrics={},num_healthy_workers=10,num_in_flight_async_reqs=19,num_remote_worker_restarts=0,num_agent_steps_sampled=85500,num_agent_steps_trained=85000,num_env_steps_sampled=85500,num_env_steps_trained=85000,num_env_steps_sampled_this_iter=30250,num_env_steps_trained_this_iter=30000,num_steps_trained_this_iter=30000,agent_timesteps_total=85500,timers={'training_iteration_time_ms': 1.886, 'sample_time_ms': 0.337, 'synch_weights_time_ms': 1.013},counters={'num_env_steps_sampled': 85500, 'num_env_steps_trained': 85000, 'num_agent_steps_sampled': 85500, 'num_agent_steps_trained': 85000, 'num_training_step_calls_since_last_synch_worker_weights': 0, 'num_weight_broadcasts': 222, 'num_samples_added_to_queue': 85500},perf={'cpu_util_percent': 35.34375, 'ram_util_percent': 5.225} with parameters={'env_config': {'frameskip': 1, 'full_action_space': False, 'repeat_action_probability': 0.0}, 'rollout_fragment_length': 50, 'train_batch_size': 500, 'num_workers': 10, 'num_envs_per_worker': 5, 'clip_rewards': True, 'lr': 0.0005, 'num_gpus': 1, 'framework': 'torch', 'env': 'ALE/Breakout-v5'}. == Status == Current time: 2023-03-29 09:23:10 (running for 00:01:06.85) Using FIFO scheduling algorithm. Logical resource usage: 11.0/32 CPUs, 1.0/2 GPUs (0.0/1.0 accelerator_type:M60) Result logdir: /home/ray/ray_results/impala-breakoutnoframeskip-v5-torch Number of trials: 1/1 (1 RUNNING) +------------------------------------+----------+-------------------+-------------+--------+----------------+----------------+----------------+------------------+---------------+ | Trial name | status | loc | framework | iter | time_total_s | ts (sampled) | ts (trained) | train_episodes | reward_mean | |------------------------------------+----------+-------------------+-------------+--------+----------------+----------------+----------------+------------------+---------------| | IMPALA_ALE_Breakout-v5_d70d1_00000 | RUNNING | 10.0.2.182:127980 | torch | 3 | 34.3609 | 85500 | 85000 | 149 | 2.7047 | +------------------------------------+----------+-------------------+-------------+--------+----------------+----------------+----------------+------------------+---------------+

Trial IMPALA_ALE_Breakout-v5_d70d1_00000 reported custom_metrics={},episode_media={},info={'learner': {'default_policy': {'custom_metrics': {}, 'learner_stats': {'cur_lr': 0.0005, 'total_loss': 13.279269218444824, 'policy_loss': 14.329699516296387, 'entropy': 0.9800087213516235, 'entropy_coeff': 0.01, 'var_gnorm': 12.450754165649414, 'vf_loss': 7.5032267570495605, 'vf_explained_var': 0.7622262835502625}, 'model': {}, 'num_grad_updates_lifetime': 230.0, 'diff_num_grad_updates_vs_sampler_policy': 11.5}}, 'num_env_steps_sampled': 115250, 'num_env_steps_trained': 115000, 'num_agent_steps_sampled': 115250, 'num_agent_steps_trained': 115000, 'num_training_step_calls_since_last_synch_worker_weights': 755, 'num_weight_broadcasts': 301, 'num_samples_added_to_queue': 115000, 'learner_queue': {'size_count': 230, 'size_mean': 0.0, 'size_std': 0.0, 'size_quantiles': [0.0, 0.0, 0.0, 0.0, 0.0]}, 'timing_breakdown': {'learner_grad_time_ms': 124.085, 'learner_load_time_ms': 4.89, 'learner_load_wait_time_ms': 58.161, 'learner_dequeue_time_ms': 2890.702}},sampler_results={'episode_reward_max': 10.0, 'episode_reward_min': 0.0, 'episode_reward_mean': 1.6746987951807228, 'episode_len_mean': 777.6144578313254, 'episode_media': {}, 'episodes_this_iter': 166, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [4.0, 2.0, 0.0, 3.0, 2.0, 0.0, 2.0, 2.0, 0.0, 4.0, 0.0, 3.0, 2.0, 0.0, 2.0, 0.0, 0.0, 4.0, 3.0, 3.0, 4.0, 0.0, 1.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 1.0, 4.0, 0.0, 0.0, 2.0, 0.0, 2.0, 2.0, 2.0, 0.0, 1.0, 5.0, 0.0, 4.0, 0.0, 9.0, 4.0, 3.0, 0.0, 0.0, 2.0, 1.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 4.0, 5.0, 2.0, 0.0, 0.0, 1.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, 1.0, 0.0, 10.0, 1.0, 4.0, 5.0, 0.0, 0.0, 3.0, 0.0, 3.0, 2.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 6.0, 0.0, 1.0, 3.0, 2.0, 0.0, 2.0, 0.0, 2.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 3.0, 3.0, 3.0, 0.0, 4.0, 3.0, 2.0, 2.0, 0.0, 2.0, 2.0, 2.0, 1.0, 0.0, 0.0, 0.0, 2.0, 1.0, 6.0, 9.0, 0.0, 6.0, 3.0, 3.0, 0.0, 2.0, 1.0, 2.0, 2.0, 0.0, 3.0, 2.0, 0.0, 0.0, 3.0, 2.0, 10.0, 1.0, 0.0, 0.0, 4.0, 2.0, 2.0, 2.0, 3.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 4.0], 'episode_lengths': [1214, 834, 516, 1022, 842, 492, 811, 806, 520, 1138, 513, 931, 822, 517, 812, 515, 525, 1202, 1026, 1014, 1202, 504, 636, 509, 809, 795, 516, 507, 512, 519, 810, 537, 810, 711, 1218, 526, 517, 830, 509, 835, 833, 829, 529, 648, 1317, 512, 1212, 532, 1624, 1207, 939, 543, 542, 830, 635, 508, 628, 516, 817, 507, 520, 1111, 1312, 795, 543, 529, 623, 720, 631, 809, 505, 532, 517, 503, 1144, 504, 520, 1300, 1296, 525, 503, 705, 507, 1730, 704, 1218, 1422, 514, 533, 872, 519, 935, 805, 520, 513, 830, 509, 718, 507, 1610, 533, 641, 1015, 814, 522, 839, 517, 817, 523, 620, 523, 812, 529, 524, 962, 995, 1004, 524, 1210, 1010, 806, 838, 517, 802, 790, 799, 697, 531, 536, 511, 890, 621, 1525, 1620, 520, 1611, 1018, 1026, 522, 761, 645, 804, 895, 515, 933, 814, 508, 535, 1000, 811, 1822, 697, 507, 512, 1230, 818, 819, 796, 907, 712, 532, 908, 531, 523, 525, 1205]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 3.094785060279142, 'mean_inference_ms': 6.571849059434162, 'mean_action_processing_ms': 0.6336455005969693, 'mean_env_wait_ms': 7.782669379977965, 'mean_env_render_ms': 0.0}, 'num_faulty_episodes': 0, 'connector_metrics': {}},episode_reward_max=10.0,episode_reward_min=0.0,episode_reward_mean=1.6746987951807228,episode_len_mean=777.6144578313254,episodes_this_iter=166,policy_reward_min={},policy_reward_max={},policy_reward_mean={},hist_stats={'episode_reward': [4.0, 2.0, 0.0, 3.0, 2.0, 0.0, 2.0, 2.0, 0.0, 4.0, 0.0, 3.0, 2.0, 0.0, 2.0, 0.0, 0.0, 4.0, 3.0, 3.0, 4.0, 0.0, 1.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 1.0, 4.0, 0.0, 0.0, 2.0, 0.0, 2.0, 2.0, 2.0, 0.0, 1.0, 5.0, 0.0, 4.0, 0.0, 9.0, 4.0, 3.0, 0.0, 0.0, 2.0, 1.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 4.0, 5.0, 2.0, 0.0, 0.0, 1.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, 1.0, 0.0, 10.0, 1.0, 4.0, 5.0, 0.0, 0.0, 3.0, 0.0, 3.0, 2.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 6.0, 0.0, 1.0, 3.0, 2.0, 0.0, 2.0, 0.0, 2.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 3.0, 3.0, 3.0, 0.0, 4.0, 3.0, 2.0, 2.0, 0.0, 2.0, 2.0, 2.0, 1.0, 0.0, 0.0, 0.0, 2.0, 1.0, 6.0, 9.0, 0.0, 6.0, 3.0, 3.0, 0.0, 2.0, 1.0, 2.0, 2.0, 0.0, 3.0, 2.0, 0.0, 0.0, 3.0, 2.0, 10.0, 1.0, 0.0, 0.0, 4.0, 2.0, 2.0, 2.0, 3.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 4.0], 'episode_lengths': [1214, 834, 516, 1022, 842, 492, 811, 806, 520, 1138, 513, 931, 822, 517, 812, 515, 525, 1202, 1026, 1014, 1202, 504, 636, 509, 809, 795, 516, 507, 512, 519, 810, 537, 810, 711, 1218, 526, 517, 830, 509, 835, 833, 829, 529, 648, 1317, 512, 1212, 532, 1624, 1207, 939, 543, 542, 830, 635, 508, 628, 516, 817, 507, 520, 1111, 1312, 795, 543, 529, 623, 720, 631, 809, 505, 532, 517, 503, 1144, 504, 520, 1300, 1296, 525, 503, 705, 507, 1730, 704, 1218, 1422, 514, 533, 872, 519, 935, 805, 520, 513, 830, 509, 718, 507, 1610, 533, 641, 1015, 814, 522, 839, 517, 817, 523, 620, 523, 812, 529, 524, 962, 995, 1004, 524, 1210, 1010, 806, 838, 517, 802, 790, 799, 697, 531, 536, 511, 890, 621, 1525, 1620, 520, 1611, 1018, 1026, 522, 761, 645, 804, 895, 515, 933, 814, 508, 535, 1000, 811, 1822, 697, 507, 512, 1230, 818, 819, 796, 907, 712, 532, 908, 531, 523, 525, 1205]},sampler_perf={'mean_raw_obs_processing_ms': 3.094785060279142, 'mean_inference_ms': 6.571849059434162, 'mean_action_processing_ms': 0.6336455005969693, 'mean_env_wait_ms': 7.782669379977965, 'mean_env_render_ms': 0.0},num_faulty_episodes=0,connector_metrics={},num_healthy_workers=10,num_in_flight_async_reqs=20,num_remote_worker_restarts=0,num_agent_steps_sampled=115250,num_agent_steps_trained=115000,num_env_steps_sampled=115250,num_env_steps_trained=115000,num_env_steps_sampled_this_iter=29750,num_env_steps_trained_this_iter=30000,num_steps_trained_this_iter=30000,agent_timesteps_total=115250,timers={'training_iteration_time_ms': 0.367, 'sample_time_ms': 0.255, 'synch_weights_time_ms': 0.028},counters={'num_env_steps_sampled': 115250, 'num_env_steps_trained': 115000, 'num_agent_steps_sampled': 115250, 'num_agent_steps_trained': 115000, 'num_training_step_calls_since_last_synch_worker_weights': 755, 'num_weight_broadcasts': 301, 'num_samples_added_to_queue': 115000},perf={'cpu_util_percent': 34.54117647058823, 'ram_util_percent': 5.2294117647058815} with parameters={'env_config': {'frameskip': 1, 'full_action_space': False, 'repeat_action_probability': 0.0}, 'rollout_fragment_length': 50, 'train_batch_size': 500, 'num_workers': 10, 'num_envs_per_worker': 5, 'clip_rewards': True, 'lr': 0.0005, 'num_gpus': 1, 'framework': 'torch', 'env': 'ALE/Breakout-v5'}.



### Versions / Dependencies

nightly

### Reproduction script

learning_tests_impala_torch

### Issue Severity

Low: It annoys or frustrates me.

xwjiang2010 commented 1 year ago

for 1 and 2, can you check with rl folks what are the preferred output? I am not sure what should replace iteration?

4 will be fixed by https://github.com/ray-project/ray/pull/33871

scottsun94 commented 1 year ago

For 1 and 2, @gjoliver @kouroshHakha can you comment on these two?

ray-project / ray

[AIR output] UX issues of new AIR output for learning_tests_impala_torch (AIR_VERBOSITY=0, 1, 2) #33867

What happened + What you expected to happen