ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.28k stars 5.63k forks source link

[AIR output] UX issues of new AIR output for learning_tests_impala_torch (AIR_VERBOSITY=0, 1, 2) #33867

Closed scottsun94 closed 1 year ago

scottsun94 commented 1 year ago

What happened + What you expected to happen

  1. It seems that we do show "iteration". Not sure if it's a good thing or not because users may not be familiar with this "iteration" concept.

  2. We should ask RLlib team to choose fewer metrics for default output (not sure if it should be customized per algorithm)

  3. Format of the output should be updated as mentioned in other related issues

    Training finished iter 13 at 2023-03-29 09:08:13 (running for 00:03:11.52)
    agent_timesteps_total: 389000
    connector_metrics: {}
    counters:
    num_agent_steps_sampled: 389000
    num_agent_steps_trained: 388500
    num_env_steps_sampled: 389000
    num_env_steps_trained: 388500
    num_samples_added_to_queue: 389000
    num_training_step_calls_since_last_synch_worker_weights: 134
    num_weight_broadcasts: 964
    custom_metrics: {}
    episode_len_mean: 1721.88
    episode_media: {}
    episode_reward_max: 36.0
    episode_reward_mean: 9.77
    episode_reward_min: 4.0
    episodes_this_iter: 67
    episodes_total: 1710
    info:
    learner:
    default_policy:
      custom_metrics: {}
      diff_num_grad_updates_vs_sampler_policy: 10.0
      learner_stats:
        cur_lr: 0.0005
        entropy: 1.0906739234924316
        entropy_coeff: 0.01
        policy_loss: -32.83232116699219
        total_loss: -28.335006713867188
        var_gnorm: 16.424108505249023
        vf_explained_var: 0.6170328259468079
        vf_loss: 19.683231353759766
      model: {}
      num_grad_updates_lifetime: 777.0
    learner_queue:
    size_count: 778
    size_mean: 0.0
    size_quantiles: [0.0, 0.0, 0.0, 0.0, 0.0]
    size_std: 0.0
    num_agent_steps_sampled: 389000
    num_agent_steps_trained: 388500
    num_env_steps_sampled: 389000
    num_env_steps_trained: 388500
    num_samples_added_to_queue: 389000
    num_training_step_calls_since_last_synch_worker_weights: 134
    num_weight_broadcasts: 964
    timing_breakdown:
    learner_dequeue_time_ms: 2772.957
    learner_grad_time_ms: 123.634
    learner_load_time_ms: 4.319
    learner_load_wait_time_ms: 47.829
    num_agent_steps_sampled: 389000
    num_agent_steps_trained: 388500
    num_env_steps_sampled: 389000
    num_env_steps_sampled_this_iter: 30750
    num_env_steps_trained: 388500
    num_env_steps_trained_this_iter: 31000
    num_faulty_episodes: 0
    num_healthy_workers: 10
    num_in_flight_async_reqs: 20
    num_remote_worker_restarts: 0
    num_steps_trained_this_iter: 31000
    perf:
    cpu_util_percent: 34.94117647058823
    ram_util_percent: 5.211764705882353
    policy_reward_max: {}
    policy_reward_mean: {}
    policy_reward_min: {}
    sampler_perf:
    mean_action_processing_ms: 0.6395067034460499
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 7.840870172264184
    mean_inference_ms: 6.614064184668028
    mean_raw_obs_processing_ms: 2.9528597540097277
    sampler_results:
    connector_metrics: {}
    custom_metrics: {}
    episode_len_mean: 1721.88
    episode_media: {}
    episode_reward_max: 36.0
    episode_reward_mean: 9.77
    episode_reward_min: 4.0
    episodes_this_iter: 67
    hist_stats:
    episode_lengths: [1414, 1306, 1641, 1446, 1234, 2026, 1600, 2454, 1359, 1572,
      1411, 1471, 1463, 1269, 1347, 1083, 2344, 1095, 1956, 1603, 1255, 2218, 1208,
      1943, 1483, 1158, 2108, 1073, 1535, 2590, 1804, 1802, 2109, 1783, 1099, 1258,
      1211, 1826, 2480, 1977, 1649, 1159, 1598, 1972, 2280, 2026, 1732, 1167, 1884,
      1599, 1722, 2156, 1723, 1767, 1387, 1849, 2061, 2356, 1875, 1727, 2524, 1620,
      1926, 1507, 1902, 1999, 1914, 1514, 1699, 1095, 2081, 1632, 1520, 1578, 2329,
      985, 1681, 1719, 1836, 1306, 2122, 1726, 1804, 2020, 2076, 1235, 1074, 1970,
      1853, 1836, 1228, 1431, 2112, 1946, 2793, 1822, 2044, 1946, 2200, 1880]
    episode_reward: [9.0, 12.0, 7.0, 7.0, 5.0, 11.0, 7.0, 17.0, 10.0, 11.0, 9.0, 6.0,
      6.0, 6.0, 9.0, 4.0, 13.0, 11.0, 12.0, 7.0, 5.0, 10.0, 5.0, 10.0, 14.0, 4.0,
      10.0, 4.0, 10.0, 15.0, 7.0, 11.0, 9.0, 8.0, 11.0, 5.0, 8.0, 16.0, 13.0, 8.0,
      6.0, 12.0, 6.0, 9.0, 11.0, 13.0, 7.0, 4.0, 11.0, 7.0, 10.0, 15.0, 7.0, 15.0,
      6.0, 8.0, 10.0, 36.0, 8.0, 8.0, 14.0, 9.0, 11.0, 13.0, 15.0, 9.0, 8.0, 5.0,
      8.0, 11.0, 14.0, 11.0, 6.0, 14.0, 20.0, 4.0, 11.0, 8.0, 8.0, 12.0, 14.0, 10.0,
      10.0, 11.0, 10.0, 4.0, 4.0, 9.0, 8.0, 8.0, 4.0, 5.0, 10.0, 11.0, 20.0, 13.0,
      14.0, 8.0, 13.0, 9.0]
    num_faulty_episodes: 0
    policy_reward_max: {}
    policy_reward_mean: {}
    policy_reward_min: {}
    sampler_perf:
    mean_action_processing_ms: 0.6395067034460499
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 7.840870172264184
    mean_inference_ms: 6.614064184668028
    mean_raw_obs_processing_ms: 2.9528597540097277
    time_this_iter_s: 11.578344583511353
    time_total_s: 149.75555968284607
    timers:
    sample_time_ms: 0.242
    synch_weights_time_ms: 0.027
    training_iteration_time_ms: 0.354
    timesteps_total: 389000
    training_iteration: 13
  4. After I changed the AIR_VERBOSITY to 0, I started to see the old output flow.

    
    (base) ray@ip-10-0-2-182:~/default$ export AIR_VERBOSITY=0
    (base) ray@ip-10-0-2-182:~/default$ python learning_tests/run.py --yaml-sub-dir=impala --framework=torch
    WARNING:ray.rllib.utils.framework:Not importing JAX for test purposes.
    WARNING:ray.rllib.utils.framework:Not importing JAX for test purposes.
    abs_yaml_path=learning_tests/yaml_files/impala
    Will run the following yaml files:
    -> /home/ray/default/learning_tests/yaml_files/impala/impala-breakoutnoframeskip-v5.yaml
    2023-03-29 09:22:03,164 INFO worker.py:1415 -- Connecting to existing Ray cluster at address: 10.0.2.182:6379...
    2023-03-29 09:22:03,176 INFO worker.py:1609 -- Connected to Ray cluster. View the dashboard at https://console.anyscale-staging.com/api/v2/sessions/ses_wvy6cr3u1lcu2bj2qtuhk853v2/services?redirect_to=dashboard 
    2023-03-29 09:22:03,179 INFO packaging.py:346 -- Pushing file package 'gcs://_ray_pkg_ca40b9ada33c9d0e8f58332424084e2a.zip' (0.03MiB) to Ray cluster...
    2023-03-29 09:22:03,180 INFO packaging.py:359 -- Successfully pushed file package 'gcs://_ray_pkg_ca40b9ada33c9d0e8f58332424084e2a.zip'.
    Starting learning test iteration 0...
    == Test config ==
    impala-breakoutnoframeskip-v5-torch:
    config:
    clip_rewards: true
    env_config:
      frameskip: 1
      full_action_space: false
      repeat_action_probability: 0.0
    framework: torch
    lr: 0.0005
    num_envs_per_worker: 5
    num_gpus: 1
    num_workers: 10
    rollout_fragment_length: 50
    train_batch_size: 500
    env: ALE/Breakout-v5
    run: IMPALA
    stop:
    time_total_s: 2400

/home/ray/anaconda3/lib/python3.7/site-packages/ray/tune/progress_reporter.py:351: UserWarning: Both 'metric' and 'mode' must be set to be able to sort by metric. No sorting is performed. "Both 'metric' and 'mode' must be set to be able " (pid=127980) 2023-03-29 09:22:12,160 WARNING framework.py:28 -- Not importing JAX for test purposes. (pid=127980) 2023-03-29 09:22:12,483 WARNING framework.py:28 -- Not importing JAX for test purposes. (Impala pid=127980) 2023-03-29 09:22:12,913 WARNING algorithm_config.py:637 -- Cannot create ImpalaConfig from given config_dict! Property __stdout_file__ not supported. (Impala pid=127980) A.L.E: Arcade Learning Environment (version 0.8.0+919230b) (Impala pid=127980) [Powered by Stella] (Impala pid=127980) 2023-03-29 09:22:13,519 INFO algorithm.py:528 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags. (pid=128252) 2023-03-29 09:22:20,796 WARNING framework.py:28 -- Not importing JAX for test purposes. (pid=128253) 2023-03-29 09:22:20,781 WARNING framework.py:28 -- Not importing JAX for test purposes. (RolloutWorker pid=128252) A.L.E: Arcade Learning Environment (version 0.8.0+919230b) (RolloutWorker pid=128252) [Powered by Stella] (RolloutWorker pid=128250) 2023-03-29 09:22:21,816 WARNING deprecation.py:51 -- DeprecationWarning: FrameStack has been deprecated. This will raise an error in the future! == Status == Current time: 2023-03-29 09:22:05 (running for 00:00:02.06) Using FIFO scheduling algorithm. Logical resource usage: 11.0/32 CPUs, 1.0/2 GPUs (0.0/1.0 accelerator_type:M60) Result logdir: /home/ray/ray_results/impala-breakoutnoframeskip-v5-torch Number of trials: 1/1 (1 RUNNING) +------------------------------------+----------+-------------------+-------------+ | Trial name | status | loc | framework | |------------------------------------+----------+-------------------+-------------| | IMPALA_ALE_Breakout-v5_d70d1_00000 | RUNNING | 10.0.2.182:127980 | torch | +------------------------------------+----------+-------------------+-------------+

(Impala pid=127980) 2023-03-29 09:22:25,927 INFO trainable.py:176 -- Trainable.setup took 12.410 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads. (Impala pid=127980) 2023-03-29 09:22:25,928 WARNING util.py:67 -- Install gputil for GPU system monitoring. (pid=128258) 2023-03-29 09:22:21,337 WARNING framework.py:28 -- Not importing JAX for test purposes. [repeated 18x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication) == Status == Current time: 2023-03-29 09:22:35 (running for 00:00:32.37) Using FIFO scheduling algorithm. Logical resource usage: 11.0/32 CPUs, 1.0/2 GPUs (0.0/1.0 accelerator_type:M60) Result logdir: /home/ray/ray_results/impala-breakoutnoframeskip-v5-torch Number of trials: 1/1 (1 RUNNING) +------------------------------------+----------+-------------------+-------------+ | Trial name | status | loc | framework | |------------------------------------+----------+-------------------+-------------| | IMPALA_ALE_Breakout-v5_d70d1_00000 | RUNNING | 10.0.2.182:127980 | torch | +------------------------------------+----------+-------------------+-------------+

Trial IMPALA_ALE_Breakout-v5_d70d1_00000 reported custom_metrics={},episode_media={},info={'learner': {'default_policy': {'custom_metrics': {}, 'learner_stats': {'cur_lr': 0.0005, 'total_loss': 27.16767120361328, 'policy_loss': -2.7041698235955783e-25, 'entropy': 3.1160257634342076e-17, 'entropy_coeff': 0.01, 'var_gnorm': 9.948323249816895, 'vf_loss': 54.33534240722656, 'vf_explained_var': 0.025636136531829834}, 'model': {}, 'num_grad_updates_lifetime': 50.0, 'diff_num_grad_updates_vs_sampler_policy': 12.0}}, 'num_env_steps_sampled': 25250, 'num_env_steps_trained': 25000, 'num_agent_steps_sampled': 25250, 'num_agent_steps_trained': 25000, 'num_training_step_calls_since_last_synch_worker_weights': 79, 'num_weight_broadcasts': 79, 'num_samples_added_to_queue': 25000, 'learner_queue': {'size_count': 50, 'size_mean': 0.0, 'size_std': 0.0, 'size_quantiles': [0.0, 0.0, 0.0, 0.0, 0.0]}, 'timing_breakdown': {'learner_grad_time_ms': 128.81, 'learner_load_time_ms': 5.331, 'learner_load_wait_time_ms': 53.762, 'learner_dequeue_time_ms': 2284.748}},sampler_results={'episode_reward_max': 6.0, 'episode_reward_min': 0.0, 'episode_reward_mean': 1.2105263157894737, 'episode_len_mean': 717.4868421052631, 'episode_media': {}, 'episodes_this_iter': 152, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [0.0, 0.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 0.0, 2.0, 2.0, 2.0, 3.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 5.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, 4.0, 4.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 4.0, 2.0, 2.0, 2.0, 2.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], 'episode_lengths': [516, 501, 806, 809, 801, 521, 523, 539, 516, 543, 525, 521, 517, 543, 531, 527, 532, 1406, 524, 809, 817, 800, 813, 817, 541, 531, 1414, 1430, 1414, 1406, 516, 791, 798, 805, 814, 819, 537, 536, 525, 517, 519, 1426, 1406, 516, 814, 812, 796, 988, 1016, 515, 536, 540, 529, 527, 1206, 531, 523, 1422, 540, 816, 800, 803, 804, 818, 535, 516, 543, 529, 543, 528, 1430, 531, 519, 1546, 794, 817, 790, 814, 816, 528, 528, 515, 537, 528, 1218, 519, 1228, 1224, 1220, 649, 790, 814, 819, 818, 813, 532, 516, 531, 515, 529, 516, 529, 1410, 529, 532, 800, 793, 791, 817, 790, 537, 540, 520, 529, 520, 527, 532, 1214, 515, 528, 515, 1212, 818, 813, 812, 814, 992, 528, 516, 528, 521, 528, 541, 535, 532, 540, 791, 796, 814, 794, 816, 521, 535, 539, 536, 532, 541, 527, 537, 531, 525, 533]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 3.1451655233545353, 'mean_inference_ms': 6.558934767208633, 'mean_action_processing_ms': 0.6327834007776333, 'mean_env_wait_ms': 7.896650866089233, 'mean_env_render_ms': 0.0}, 'num_faulty_episodes': 0, 'connector_metrics': {}},episode_reward_max=6.0,episode_reward_min=0.0,episode_reward_mean=1.2105263157894737,episode_len_mean=717.4868421052631,episodes_this_iter=152,policy_reward_min={},policy_reward_max={},policy_reward_mean={},hist_stats={'episode_reward': [0.0, 0.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 0.0, 2.0, 2.0, 2.0, 3.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 5.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, 4.0, 4.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 4.0, 2.0, 2.0, 2.0, 2.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], 'episode_lengths': [516, 501, 806, 809, 801, 521, 523, 539, 516, 543, 525, 521, 517, 543, 531, 527, 532, 1406, 524, 809, 817, 800, 813, 817, 541, 531, 1414, 1430, 1414, 1406, 516, 791, 798, 805, 814, 819, 537, 536, 525, 517, 519, 1426, 1406, 516, 814, 812, 796, 988, 1016, 515, 536, 540, 529, 527, 1206, 531, 523, 1422, 540, 816, 800, 803, 804, 818, 535, 516, 543, 529, 543, 528, 1430, 531, 519, 1546, 794, 817, 790, 814, 816, 528, 528, 515, 537, 528, 1218, 519, 1228, 1224, 1220, 649, 790, 814, 819, 818, 813, 532, 516, 531, 515, 529, 516, 529, 1410, 529, 532, 800, 793, 791, 817, 790, 537, 540, 520, 529, 520, 527, 532, 1214, 515, 528, 515, 1212, 818, 813, 812, 814, 992, 528, 516, 528, 521, 528, 541, 535, 532, 540, 791, 796, 814, 794, 816, 521, 535, 539, 536, 532, 541, 527, 537, 531, 525, 533]},sampler_perf={'mean_raw_obs_processing_ms': 3.1451655233545353, 'mean_inference_ms': 6.558934767208633, 'mean_action_processing_ms': 0.6327834007776333, 'mean_env_wait_ms': 7.896650866089233, 'mean_env_render_ms': 0.0},num_faulty_episodes=0,connector_metrics={},num_healthy_workers=10,num_in_flight_async_reqs=20,num_remote_worker_restarts=0,num_agent_steps_sampled=25250,num_agent_steps_trained=25000,num_env_steps_sampled=25250,num_env_steps_trained=25000,num_env_steps_sampled_this_iter=25250,num_env_steps_trained_this_iter=25000,num_steps_trained_this_iter=25000,agent_timesteps_total=25250,timers={'training_iteration_time_ms': 0.348, 'sample_time_ms': 0.239, 'synch_weights_time_ms': 0.026},counters={'num_env_steps_sampled': 25250, 'num_env_steps_trained': 25000, 'num_agent_steps_sampled': 25250, 'num_agent_steps_trained': 25000, 'num_training_step_calls_since_last_synch_worker_weights': 79, 'num_weight_broadcasts': 79, 'num_samples_added_to_queue': 25000},perf={'cpu_util_percent': 35.023529411764706, 'ram_util_percent': 5.2} with parameters={'env_config': {'frameskip': 1, 'full_action_space': False, 'repeat_action_probability': 0.0}, 'rollout_fragment_length': 50, 'train_batch_size': 500, 'num_workers': 10, 'num_envs_per_worker': 5, 'clip_rewards': True, 'lr': 0.0005, 'num_gpus': 1, 'framework': 'torch', 'env': 'ALE/Breakout-v5'}. Trial IMPALA_ALE_Breakout-v5_d70d1_00000 reported custom_metrics={},episode_media={},info={'learner': {'default_policy': {'custom_metrics': {}, 'learner_stats': {'cur_lr': 0.0005, 'total_loss': 11.154390335083008, 'policy_loss': 0.0, 'entropy': 3.126514513951406e-08, 'entropy_coeff': 0.01, 'var_gnorm': 10.71700382232666, 'vf_loss': 22.308780670166016, 'vf_explained_var': 0.7661240696907043}, 'model': {}, 'num_grad_updates_lifetime': 110.0, 'diff_num_grad_updates_vs_sampler_policy': 13.0}}, 'num_env_steps_sampled': 55250, 'num_env_steps_trained': 55000, 'num_agent_steps_sampled': 55250, 'num_agent_steps_trained': 55000, 'num_training_step_calls_since_last_synch_worker_weights': 1036, 'num_weight_broadcasts': 143, 'num_samples_added_to_queue': 55000, 'learner_queue': {'size_count': 110, 'size_mean': 0.0, 'size_std': 0.0, 'size_quantiles': [0.0, 0.0, 0.0, 0.0, 0.0]}, 'timing_breakdown': {'learner_grad_time_ms': 125.55, 'learner_load_time_ms': 5.345, 'learner_load_wait_time_ms': 55.379, 'learner_dequeue_time_ms': 2533.677}},sampler_results={'episode_reward_max': 9.0, 'episode_reward_min': 0.0, 'episode_reward_mean': 2.3223684210526314, 'episode_len_mean': 835.8684210526316, 'episode_media': {}, 'episodes_this_iter': 152, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [0.0, 0.0, 0.0, 5.0, 0.0, 9.0, 9.0, 9.0, 9.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 9.0, 0.0, 9.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 9.0, 9.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 0.0, 5.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 5.0, 0.0, 0.0, 5.0, 5.0, 0.0, 0.0, 0.0, 9.0, 9.0, 9.0, 0.0, 0.0], 'episode_lengths': [540, 528, 515, 1422, 535, 1616, 1624, 1640, 1636, 536, 1640, 531, 535, 519, 528, 533, 520, 537, 540, 536, 540, 539, 1640, 519, 541, 520, 537, 524, 516, 1628, 1636, 535, 1410, 536, 539, 533, 528, 537, 1406, 521, 516, 516, 540, 529, 517, 528, 523, 516, 1620, 1636, 1224, 529, 527, 527, 515, 519, 1628, 1624, 1628, 519, 540, 520, 517, 543, 535, 520, 1308, 531, 515, 516, 1632, 1616, 536, 529, 1632, 1636, 516, 537, 517, 527, 535, 519, 524, 541, 515, 536, 520, 1644, 527, 539, 1616, 527, 1628, 541, 528, 1406, 525, 517, 529, 1426, 521, 520, 523, 516, 521, 528, 529, 1644, 1640, 515, 537, 524, 524, 517, 769, 531, 519, 1624, 1640, 531, 1644, 1620, 516, 516, 536, 536, 1426, 1418, 523, 1430, 531, 520, 535, 1620, 1644, 531, 543, 515, 1636, 1426, 529, 528, 1414, 1426, 533, 532, 541, 1644, 1620, 1640, 523, 540]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 3.0960928772947756, 'mean_inference_ms': 6.56534965209043, 'mean_action_processing_ms': 0.631634179682276, 'mean_env_wait_ms': 7.807842136017302, 'mean_env_render_ms': 0.0}, 'num_faulty_episodes': 0, 'connector_metrics': {}},episode_reward_max=9.0,episode_reward_min=0.0,episode_reward_mean=2.3223684210526314,episode_len_mean=835.8684210526316,episodes_this_iter=152,policy_reward_min={},policy_reward_max={},policy_reward_mean={},hist_stats={'episode_reward': [0.0, 0.0, 0.0, 5.0, 0.0, 9.0, 9.0, 9.0, 9.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 9.0, 0.0, 9.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 9.0, 9.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 0.0, 5.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 5.0, 0.0, 0.0, 5.0, 5.0, 0.0, 0.0, 0.0, 9.0, 9.0, 9.0, 0.0, 0.0], 'episode_lengths': [540, 528, 515, 1422, 535, 1616, 1624, 1640, 1636, 536, 1640, 531, 535, 519, 528, 533, 520, 537, 540, 536, 540, 539, 1640, 519, 541, 520, 537, 524, 516, 1628, 1636, 535, 1410, 536, 539, 533, 528, 537, 1406, 521, 516, 516, 540, 529, 517, 528, 523, 516, 1620, 1636, 1224, 529, 527, 527, 515, 519, 1628, 1624, 1628, 519, 540, 520, 517, 543, 535, 520, 1308, 531, 515, 516, 1632, 1616, 536, 529, 1632, 1636, 516, 537, 517, 527, 535, 519, 524, 541, 515, 536, 520, 1644, 527, 539, 1616, 527, 1628, 541, 528, 1406, 525, 517, 529, 1426, 521, 520, 523, 516, 521, 528, 529, 1644, 1640, 515, 537, 524, 524, 517, 769, 531, 519, 1624, 1640, 531, 1644, 1620, 516, 516, 536, 536, 1426, 1418, 523, 1430, 531, 520, 535, 1620, 1644, 531, 543, 515, 1636, 1426, 529, 528, 1414, 1426, 533, 532, 541, 1644, 1620, 1640, 523, 540]},sampler_perf={'mean_raw_obs_processing_ms': 3.0960928772947756, 'mean_inference_ms': 6.56534965209043, 'mean_action_processing_ms': 0.631634179682276, 'mean_env_wait_ms': 7.807842136017302, 'mean_env_render_ms': 0.0},num_faulty_episodes=0,connector_metrics={},num_healthy_workers=10,num_in_flight_async_reqs=20,num_remote_worker_restarts=0,num_agent_steps_sampled=55250,num_agent_steps_trained=55000,num_env_steps_sampled=55250,num_env_steps_trained=55000,num_env_steps_sampled_this_iter=30000,num_env_steps_trained_this_iter=30000,num_steps_trained_this_iter=30000,agent_timesteps_total=55250,timers={'training_iteration_time_ms': 0.347, 'sample_time_ms': 0.235, 'synch_weights_time_ms': 0.026},counters={'num_env_steps_sampled': 55250, 'num_env_steps_trained': 55000, 'num_agent_steps_sampled': 55250, 'num_agent_steps_trained': 55000, 'num_training_step_calls_since_last_synch_worker_weights': 1036, 'num_weight_broadcasts': 143, 'num_samples_added_to_queue': 55000},perf={'cpu_util_percent': 34.7, 'ram_util_percent': 5.24375} with parameters={'env_config': {'frameskip': 1, 'full_action_space': False, 'repeat_action_probability': 0.0}, 'rollout_fragment_length': 50, 'train_batch_size': 500, 'num_workers': 10, 'num_envs_per_worker': 5, 'clip_rewards': True, 'lr': 0.0005, 'num_gpus': 1, 'framework': 'torch', 'env': 'ALE/Breakout-v5'}. Trial IMPALA_ALE_Breakout-v5_d70d1_00000 reported custom_metrics={},episode_media={},info={'learner': {'default_policy': {'custom_metrics': {}, 'learner_stats': {'cur_lr': 0.0005, 'total_loss': -1.586639165878296, 'policy_loss': 1.0538454055786133, 'entropy': 0.7403178215026855, 'entropy_coeff': 0.01, 'var_gnorm': 11.36281967163086, 'vf_loss': 1.9741454124450684, 'vf_explained_var': 0.9754490852355957}, 'model': {}, 'num_grad_updates_lifetime': 170.0, 'diff_num_grad_updates_vs_sampler_policy': 11.5}}, 'num_env_steps_sampled': 85500, 'num_env_steps_trained': 85000, 'num_agent_steps_sampled': 85500, 'num_agent_steps_trained': 85000, 'num_training_step_calls_since_last_synch_worker_weights': 0, 'num_weight_broadcasts': 222, 'num_samples_added_to_queue': 85500, 'learner_queue': {'size_count': 171, 'size_mean': 0.0, 'size_std': 0.0, 'size_quantiles': [0.0, 0.0, 0.0, 0.0, 0.0]}, 'timing_breakdown': {'learner_grad_time_ms': 119.573, 'learner_load_time_ms': 4.85, 'learner_load_wait_time_ms': 60.447, 'learner_dequeue_time_ms': 2872.035}},sampler_results={'episode_reward_max': 9.0, 'episode_reward_min': 0.0, 'episode_reward_mean': 2.704697986577181, 'episode_len_mean': 860.6912751677852, 'episode_media': {}, 'episodes_this_iter': 149, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [0.0, 9.0, 0.0, 0.0, 9.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 0.0, 9.0, 9.0, 0.0, 9.0, 0.0, 9.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 9.0, 6.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 9.0, 9.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 9.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 1.0, 0.0, 9.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 9.0, 9.0, 9.0, 0.0, 9.0, 0.0, 0.0, 0.0], 'episode_lengths': [536, 1640, 524, 524, 1616, 531, 1624, 1628, 543, 540, 533, 540, 1616, 531, 517, 516, 519, 527, 540, 516, 525, 529, 541, 529, 536, 1624, 1624, 1620, 528, 528, 536, 1620, 536, 1628, 1616, 525, 1628, 543, 1628, 527, 532, 525, 1640, 520, 537, 515, 520, 1624, 1620, 532, 525, 521, 515, 1628, 1628, 523, 540, 539, 1632, 541, 528, 1640, 1624, 515, 1620, 1620, 516, 541, 519, 529, 517, 521, 520, 1616, 529, 543, 539, 515, 535, 515, 537, 1616, 1616, 517, 539, 521, 515, 523, 519, 528, 527, 533, 528, 536, 524, 540, 1620, 517, 527, 1632, 536, 1628, 1620, 525, 508, 537, 1628, 1644, 1636, 521, 536, 525, 1636, 527, 533, 525, 519, 528, 527, 1628, 519, 637, 536, 1640, 528, 529, 1644, 1644, 517, 520, 541, 1644, 527, 519, 527, 516, 529, 537, 1632, 1632, 528, 1616, 1640, 1628, 524, 1636, 533, 536, 511]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 3.0718346838518067, 'mean_inference_ms': 6.57333212573228, 'mean_action_processing_ms': 0.6328219398377659, 'mean_env_wait_ms': 7.789589344047068, 'mean_env_render_ms': 0.0}, 'num_faulty_episodes': 0, 'connector_metrics': {}},episode_reward_max=9.0,episode_reward_min=0.0,episode_reward_mean=2.704697986577181,episode_len_mean=860.6912751677852,episodes_this_iter=149,policy_reward_min={},policy_reward_max={},policy_reward_mean={},hist_stats={'episode_reward': [0.0, 9.0, 0.0, 0.0, 9.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 0.0, 9.0, 9.0, 0.0, 9.0, 0.0, 9.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 9.0, 6.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 9.0, 9.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 9.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 1.0, 0.0, 9.0, 0.0, 0.0, 9.0, 9.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 9.0, 0.0, 9.0, 9.0, 9.0, 0.0, 9.0, 0.0, 0.0, 0.0], 'episode_lengths': [536, 1640, 524, 524, 1616, 531, 1624, 1628, 543, 540, 533, 540, 1616, 531, 517, 516, 519, 527, 540, 516, 525, 529, 541, 529, 536, 1624, 1624, 1620, 528, 528, 536, 1620, 536, 1628, 1616, 525, 1628, 543, 1628, 527, 532, 525, 1640, 520, 537, 515, 520, 1624, 1620, 532, 525, 521, 515, 1628, 1628, 523, 540, 539, 1632, 541, 528, 1640, 1624, 515, 1620, 1620, 516, 541, 519, 529, 517, 521, 520, 1616, 529, 543, 539, 515, 535, 515, 537, 1616, 1616, 517, 539, 521, 515, 523, 519, 528, 527, 533, 528, 536, 524, 540, 1620, 517, 527, 1632, 536, 1628, 1620, 525, 508, 537, 1628, 1644, 1636, 521, 536, 525, 1636, 527, 533, 525, 519, 528, 527, 1628, 519, 637, 536, 1640, 528, 529, 1644, 1644, 517, 520, 541, 1644, 527, 519, 527, 516, 529, 537, 1632, 1632, 528, 1616, 1640, 1628, 524, 1636, 533, 536, 511]},sampler_perf={'mean_raw_obs_processing_ms': 3.0718346838518067, 'mean_inference_ms': 6.57333212573228, 'mean_action_processing_ms': 0.6328219398377659, 'mean_env_wait_ms': 7.789589344047068, 'mean_env_render_ms': 0.0},num_faulty_episodes=0,connector_metrics={},num_healthy_workers=10,num_in_flight_async_reqs=19,num_remote_worker_restarts=0,num_agent_steps_sampled=85500,num_agent_steps_trained=85000,num_env_steps_sampled=85500,num_env_steps_trained=85000,num_env_steps_sampled_this_iter=30250,num_env_steps_trained_this_iter=30000,num_steps_trained_this_iter=30000,agent_timesteps_total=85500,timers={'training_iteration_time_ms': 1.886, 'sample_time_ms': 0.337, 'synch_weights_time_ms': 1.013},counters={'num_env_steps_sampled': 85500, 'num_env_steps_trained': 85000, 'num_agent_steps_sampled': 85500, 'num_agent_steps_trained': 85000, 'num_training_step_calls_since_last_synch_worker_weights': 0, 'num_weight_broadcasts': 222, 'num_samples_added_to_queue': 85500},perf={'cpu_util_percent': 35.34375, 'ram_util_percent': 5.225} with parameters={'env_config': {'frameskip': 1, 'full_action_space': False, 'repeat_action_probability': 0.0}, 'rollout_fragment_length': 50, 'train_batch_size': 500, 'num_workers': 10, 'num_envs_per_worker': 5, 'clip_rewards': True, 'lr': 0.0005, 'num_gpus': 1, 'framework': 'torch', 'env': 'ALE/Breakout-v5'}. == Status == Current time: 2023-03-29 09:23:10 (running for 00:01:06.85) Using FIFO scheduling algorithm. Logical resource usage: 11.0/32 CPUs, 1.0/2 GPUs (0.0/1.0 accelerator_type:M60) Result logdir: /home/ray/ray_results/impala-breakoutnoframeskip-v5-torch Number of trials: 1/1 (1 RUNNING) +------------------------------------+----------+-------------------+-------------+--------+----------------+----------------+----------------+------------------+---------------+ | Trial name | status | loc | framework | iter | time_total_s | ts (sampled) | ts (trained) | train_episodes | reward_mean | |------------------------------------+----------+-------------------+-------------+--------+----------------+----------------+----------------+------------------+---------------| | IMPALA_ALE_Breakout-v5_d70d1_00000 | RUNNING | 10.0.2.182:127980 | torch | 3 | 34.3609 | 85500 | 85000 | 149 | 2.7047 | +------------------------------------+----------+-------------------+-------------+--------+----------------+----------------+----------------+------------------+---------------+

Trial IMPALA_ALE_Breakout-v5_d70d1_00000 reported custom_metrics={},episode_media={},info={'learner': {'default_policy': {'custom_metrics': {}, 'learner_stats': {'cur_lr': 0.0005, 'total_loss': 13.279269218444824, 'policy_loss': 14.329699516296387, 'entropy': 0.9800087213516235, 'entropy_coeff': 0.01, 'var_gnorm': 12.450754165649414, 'vf_loss': 7.5032267570495605, 'vf_explained_var': 0.7622262835502625}, 'model': {}, 'num_grad_updates_lifetime': 230.0, 'diff_num_grad_updates_vs_sampler_policy': 11.5}}, 'num_env_steps_sampled': 115250, 'num_env_steps_trained': 115000, 'num_agent_steps_sampled': 115250, 'num_agent_steps_trained': 115000, 'num_training_step_calls_since_last_synch_worker_weights': 755, 'num_weight_broadcasts': 301, 'num_samples_added_to_queue': 115000, 'learner_queue': {'size_count': 230, 'size_mean': 0.0, 'size_std': 0.0, 'size_quantiles': [0.0, 0.0, 0.0, 0.0, 0.0]}, 'timing_breakdown': {'learner_grad_time_ms': 124.085, 'learner_load_time_ms': 4.89, 'learner_load_wait_time_ms': 58.161, 'learner_dequeue_time_ms': 2890.702}},sampler_results={'episode_reward_max': 10.0, 'episode_reward_min': 0.0, 'episode_reward_mean': 1.6746987951807228, 'episode_len_mean': 777.6144578313254, 'episode_media': {}, 'episodes_this_iter': 166, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [4.0, 2.0, 0.0, 3.0, 2.0, 0.0, 2.0, 2.0, 0.0, 4.0, 0.0, 3.0, 2.0, 0.0, 2.0, 0.0, 0.0, 4.0, 3.0, 3.0, 4.0, 0.0, 1.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 1.0, 4.0, 0.0, 0.0, 2.0, 0.0, 2.0, 2.0, 2.0, 0.0, 1.0, 5.0, 0.0, 4.0, 0.0, 9.0, 4.0, 3.0, 0.0, 0.0, 2.0, 1.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 4.0, 5.0, 2.0, 0.0, 0.0, 1.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, 1.0, 0.0, 10.0, 1.0, 4.0, 5.0, 0.0, 0.0, 3.0, 0.0, 3.0, 2.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 6.0, 0.0, 1.0, 3.0, 2.0, 0.0, 2.0, 0.0, 2.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 3.0, 3.0, 3.0, 0.0, 4.0, 3.0, 2.0, 2.0, 0.0, 2.0, 2.0, 2.0, 1.0, 0.0, 0.0, 0.0, 2.0, 1.0, 6.0, 9.0, 0.0, 6.0, 3.0, 3.0, 0.0, 2.0, 1.0, 2.0, 2.0, 0.0, 3.0, 2.0, 0.0, 0.0, 3.0, 2.0, 10.0, 1.0, 0.0, 0.0, 4.0, 2.0, 2.0, 2.0, 3.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 4.0], 'episode_lengths': [1214, 834, 516, 1022, 842, 492, 811, 806, 520, 1138, 513, 931, 822, 517, 812, 515, 525, 1202, 1026, 1014, 1202, 504, 636, 509, 809, 795, 516, 507, 512, 519, 810, 537, 810, 711, 1218, 526, 517, 830, 509, 835, 833, 829, 529, 648, 1317, 512, 1212, 532, 1624, 1207, 939, 543, 542, 830, 635, 508, 628, 516, 817, 507, 520, 1111, 1312, 795, 543, 529, 623, 720, 631, 809, 505, 532, 517, 503, 1144, 504, 520, 1300, 1296, 525, 503, 705, 507, 1730, 704, 1218, 1422, 514, 533, 872, 519, 935, 805, 520, 513, 830, 509, 718, 507, 1610, 533, 641, 1015, 814, 522, 839, 517, 817, 523, 620, 523, 812, 529, 524, 962, 995, 1004, 524, 1210, 1010, 806, 838, 517, 802, 790, 799, 697, 531, 536, 511, 890, 621, 1525, 1620, 520, 1611, 1018, 1026, 522, 761, 645, 804, 895, 515, 933, 814, 508, 535, 1000, 811, 1822, 697, 507, 512, 1230, 818, 819, 796, 907, 712, 532, 908, 531, 523, 525, 1205]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 3.094785060279142, 'mean_inference_ms': 6.571849059434162, 'mean_action_processing_ms': 0.6336455005969693, 'mean_env_wait_ms': 7.782669379977965, 'mean_env_render_ms': 0.0}, 'num_faulty_episodes': 0, 'connector_metrics': {}},episode_reward_max=10.0,episode_reward_min=0.0,episode_reward_mean=1.6746987951807228,episode_len_mean=777.6144578313254,episodes_this_iter=166,policy_reward_min={},policy_reward_max={},policy_reward_mean={},hist_stats={'episode_reward': [4.0, 2.0, 0.0, 3.0, 2.0, 0.0, 2.0, 2.0, 0.0, 4.0, 0.0, 3.0, 2.0, 0.0, 2.0, 0.0, 0.0, 4.0, 3.0, 3.0, 4.0, 0.0, 1.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 1.0, 4.0, 0.0, 0.0, 2.0, 0.0, 2.0, 2.0, 2.0, 0.0, 1.0, 5.0, 0.0, 4.0, 0.0, 9.0, 4.0, 3.0, 0.0, 0.0, 2.0, 1.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 4.0, 5.0, 2.0, 0.0, 0.0, 1.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, 1.0, 0.0, 10.0, 1.0, 4.0, 5.0, 0.0, 0.0, 3.0, 0.0, 3.0, 2.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 6.0, 0.0, 1.0, 3.0, 2.0, 0.0, 2.0, 0.0, 2.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 3.0, 3.0, 3.0, 0.0, 4.0, 3.0, 2.0, 2.0, 0.0, 2.0, 2.0, 2.0, 1.0, 0.0, 0.0, 0.0, 2.0, 1.0, 6.0, 9.0, 0.0, 6.0, 3.0, 3.0, 0.0, 2.0, 1.0, 2.0, 2.0, 0.0, 3.0, 2.0, 0.0, 0.0, 3.0, 2.0, 10.0, 1.0, 0.0, 0.0, 4.0, 2.0, 2.0, 2.0, 3.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 4.0], 'episode_lengths': [1214, 834, 516, 1022, 842, 492, 811, 806, 520, 1138, 513, 931, 822, 517, 812, 515, 525, 1202, 1026, 1014, 1202, 504, 636, 509, 809, 795, 516, 507, 512, 519, 810, 537, 810, 711, 1218, 526, 517, 830, 509, 835, 833, 829, 529, 648, 1317, 512, 1212, 532, 1624, 1207, 939, 543, 542, 830, 635, 508, 628, 516, 817, 507, 520, 1111, 1312, 795, 543, 529, 623, 720, 631, 809, 505, 532, 517, 503, 1144, 504, 520, 1300, 1296, 525, 503, 705, 507, 1730, 704, 1218, 1422, 514, 533, 872, 519, 935, 805, 520, 513, 830, 509, 718, 507, 1610, 533, 641, 1015, 814, 522, 839, 517, 817, 523, 620, 523, 812, 529, 524, 962, 995, 1004, 524, 1210, 1010, 806, 838, 517, 802, 790, 799, 697, 531, 536, 511, 890, 621, 1525, 1620, 520, 1611, 1018, 1026, 522, 761, 645, 804, 895, 515, 933, 814, 508, 535, 1000, 811, 1822, 697, 507, 512, 1230, 818, 819, 796, 907, 712, 532, 908, 531, 523, 525, 1205]},sampler_perf={'mean_raw_obs_processing_ms': 3.094785060279142, 'mean_inference_ms': 6.571849059434162, 'mean_action_processing_ms': 0.6336455005969693, 'mean_env_wait_ms': 7.782669379977965, 'mean_env_render_ms': 0.0},num_faulty_episodes=0,connector_metrics={},num_healthy_workers=10,num_in_flight_async_reqs=20,num_remote_worker_restarts=0,num_agent_steps_sampled=115250,num_agent_steps_trained=115000,num_env_steps_sampled=115250,num_env_steps_trained=115000,num_env_steps_sampled_this_iter=29750,num_env_steps_trained_this_iter=30000,num_steps_trained_this_iter=30000,agent_timesteps_total=115250,timers={'training_iteration_time_ms': 0.367, 'sample_time_ms': 0.255, 'synch_weights_time_ms': 0.028},counters={'num_env_steps_sampled': 115250, 'num_env_steps_trained': 115000, 'num_agent_steps_sampled': 115250, 'num_agent_steps_trained': 115000, 'num_training_step_calls_since_last_synch_worker_weights': 755, 'num_weight_broadcasts': 301, 'num_samples_added_to_queue': 115000},perf={'cpu_util_percent': 34.54117647058823, 'ram_util_percent': 5.2294117647058815} with parameters={'env_config': {'frameskip': 1, 'full_action_space': False, 'repeat_action_probability': 0.0}, 'rollout_fragment_length': 50, 'train_batch_size': 500, 'num_workers': 10, 'num_envs_per_worker': 5, 'clip_rewards': True, 'lr': 0.0005, 'num_gpus': 1, 'framework': 'torch', 'env': 'ALE/Breakout-v5'}.



### Versions / Dependencies

nightly

### Reproduction script

learning_tests_impala_torch

### Issue Severity

Low: It annoys or frustrates me.
xwjiang2010 commented 1 year ago

for 1 and 2, can you check with rl folks what are the preferred output? I am not sure what should replace iteration?

4 will be fixed by https://github.com/ray-project/ray/pull/33871

scottsun94 commented 1 year ago

For 1 and 2, @gjoliver @kouroshHakha can you comment on these two?