Closed mvindiola1 closed 3 years ago
@sven1977,
A new error popped up on this today.
(pid=565444) File "ray/rllib/policy/sample_batch.py", line 82, in __init__
(pid=565444) if self.get("seq_lens") is None or self.get("seq_lens") == []:
(pid=565444) TypeError: eq() received an invalid combination of arguments - got (list), but expected one of:
(pid=565444) * (Tensor other)
(pid=565444) didn't match because some of the arguments have invalid types: (!list!)
(pid=565444) * (Number other)
(pid=565444) didn't match because some of the arguments have invalid types: (!list!)
Traceback (most recent call last):
It appears to be coming from trying to compare a non-empty numpy array to an empty list. I fixed it like this: if self.get("seq_lens") is None or np.array_equal(self.get("seq_lens"),[]):
Very cool, thanks for this catch @mvindiola1 ! Fixing this should increase R2D2's learning capabilities. Here is the PR that fixes this. https://github.com/ray-project/ray/pull/15737
@sven1977,
I just installed the nightly wheel to retest. I am still getting the error I mentioned above in "ray/rllib/policy/sample_batch.py", line 82. Once I fixed that error and applied the pull request it worked for me.
It is learning stateless cartpole with horizon=200 in 180 iterations.
'episode_reward_max': 200.0, 'episode_reward_min': 140.0, 'episode_reward_mean': 197.56, 'episode_len_mean': 197.56, 'episode_media': {}, 'episodes_this_iter': 12, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {'default_policy': {}}, 'hist_stats': {'episode_reward': [200.0, 161.0, 200.0, 200.0, 158.0, 200.0, 180.0, 200.0, 200.0, 162.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 140.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 192.0, 200.0, 200.0, 200.0, 200.0, 200.0, 170.0, 193.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0], 'episode_lengths': [200, 161, 200, 200, 158, 200, 180, 200, 200, 162, 200, 200, 200, 200, 200, 200, 200, 200, 200, 140, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 192, 200, 200, 200, 200, 200, 170, 193, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 0.06287308769977001, 'mean_inference_ms': 1.0215408508655708, 'mean_action_processing_ms': 0.04289268263232037, 'mean_env_wait_ms': 0.07897938255070529, 'mean_env_render_ms': 0.0}, 'off_policy_estimator': {}, 'num_healthy_workers': 3, 'timesteps_total': 308115, 'agent_timesteps_total': 308115, 'timers': {'learn_time_ms': 25.774, 'learn_throughput': 1241.543, 'update_time_ms': 1.676}, 'info': {'learner': {'default_policy': {'allreduce_latency': 0.0, 'grad_gnorm': 0.31844152633625034, 'cur_lr': 0.0005, 'mean_q': 7.742862701416016, 'min_q': 0.7774804830551147, 'max_q': 10.037823677062988, 'mean_td_error': -0.026085052639245987}}, 'num_steps_sampled': 308115, 'num_agent_steps_sampled': 308115, 'num_steps_trained': 56096, 'num_agent_steps_trained': 2010932, 'last_target_update_ts': 307515, 'num_target_updates': 114}, 'done': False, 'episodes_total': 5304, 'training_iteration': 170, 'experiment_id': '9a9218b428d04beaa5f6f318b39d3761', 'date': '2021-05-12_09-05-58', 'timestamp': 1620824758, 'time_this_iter_s': 1.1060619354248047, 'time_total_s': 213.0190873146057, 'pid': 1496467, 'hostname': 'intercostal', 'node_ip': '192.168.1.216', 'config': {'num_workers': 3, 'num_envs_per_worker': 1, 'create_env_on_driver': False, 'rollout_fragment_length': 4, 'batch_mode': 'complete_episodes', 'train_batch_size': 1280, 'model': {'_use_default_native_models': False, 'fcnet_hiddens': [32], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': True, 'use_lstm': True, 'max_seq_len': 20, 'lstm_cell_size': 64, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'num_framestacks': 'auto', 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1, 'framestack': True}, 'optimizer': {}, 'gamma': 0.997, 'horizon': 200, 'soft_horizon': False, 'no_done_at_end': False, 'env': 'StatelessCartPole', 'env_config': {}, 'render_env': False, 'record_env': False, 'normalize_actions': False, 'clip_rewards': None, 'clip_actions': True, 'preprocessor_pref': 'deepmind', 'lr': 0.0005, 'log_level': 'WARN', 'callbacks': <class '__main__.Callbacks'>, 'ignore_worker_failures': False, 'log_sys_usage': True, 'fake_sampler': False, 'framework': 'torch', 'eager_tracing': False, 'explore': True, 'exploration_config': {'type': 'EpsilonGreedy', 'initial_epsilon': 1.0, 'final_epsilon': 0.02, 'epsilon_timesteps': 100000}, 'evaluation_interval': None, 'evaluation_num_episodes': 10, 'evaluation_parallel_to_training': False, 'in_evaluation': False, 'evaluation_config': {'explore': False}, 'evaluation_num_workers': 0, 'custom_eval_function': None, 'sample_async': False, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'compress_observations': False, 'collect_metrics_timeout': 180, 'metrics_smoothing_episodes': 100, 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'min_iter_time_s': 1, 'timesteps_per_iteration': 1000, 'seed': None, 'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, '_fake_gpus': False, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, 'custom_resources_per_worker': {}, 'num_cpus_for_driver': 1, 'placement_strategy': 'PACK', 'input': 'sampler', 'input_evaluation': ['is', 'wis'], 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'multiagent': {'policies': {}, 'policy_mapping_fn': None, 'policies_to_train': None, 'observation_fn': None, 'replay_mode': 'independent', 'count_steps_by': 'env_steps'}, 'logger_config': None, 'simple_optimizer': True, 'monitor': -1, 'num_atoms': 1, 'v_min': -10.0, 'v_max': 10.0, 'noisy': False, 'sigma0': 0.5, 'dueling': False, 'hiddens': [256], 'double_q': True, 'n_step': 1, 'target_network_update_freq': 2500, 'buffer_size': 100000, 'replay_sequence_length': 40, 'prioritized_replay': False, 'prioritized_replay_alpha': 0.6, 'prioritized_replay_beta': 0.4, 'final_prioritized_replay_beta': 0.4, 'prioritized_replay_beta_annealing_timesteps': 20000, 'prioritized_replay_eps': 1e-06, 'before_learn_on_batch': None, 'training_intensity': None, 'lr_schedule': None, 'adam_epsilon': 0.001, 'grad_clip': 40, 'learning_starts': 1000, 'worker_side_prioritization': False, 'zero_init_states': True, 'burn_in': 20, 'use_h_function': True, 'h_function_epsilon': 0.001}, 'time_since_restore': 213.0190873146057, 'timesteps_since_restore': 0, 'iterations_since_restore': 170, 'perf': {'cpu_util_percent': 38.85, 'ram_util_percent': 87.8}}
{'episode_reward_max': 200.0, 'episode_reward_min': 140.0, 'episode_reward_mean': 198.95, 'episode_len_mean': 198.95, 'episode_media': {}, 'episodes_this_iter': 12, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {'default_policy': {}}, 'hist_stats': {'episode_reward': [200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 140.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 192.0, 200.0, 200.0, 200.0, 200.0, 200.0, 170.0, 193.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0], 'episode_lengths': [200, 200, 200, 200, 200, 200, 200, 140, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 192, 200, 200, 200, 200, 200, 170, 193, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 0.06281894807194328, 'mean_inference_ms': 1.0212131353771958, 'mean_action_processing_ms': 0.042880792895047766, 'mean_env_wait_ms': 0.07895747886828303, 'mean_env_render_ms': 0.0}, 'off_policy_estimator': {}, 'num_healthy_workers': 3, 'timesteps_total': 310515, 'agent_timesteps_total': 310515, 'timers': {'learn_time_ms': 25.655, 'learn_throughput': 1247.335, 'update_time_ms': 1.616}, 'info': {'learner': {'default_policy': {'allreduce_latency': 0.0, 'grad_gnorm': 0.431170681735619, 'cur_lr': 0.0005, 'mean_q': 6.975432872772217, 'min_q': -0.9237803220748901, 'max_q': 10.05516529083252, 'mean_td_error': 0.06322290003299713}}, 'num_steps_sampled': 310515, 'num_agent_steps_sampled': 310515, 'num_steps_trained': 56224, 'num_agent_steps_trained': 2015978, 'last_target_update_ts': 310515, 'num_target_updates': 115}, 'done': False, 'episodes_total': 5316, 'training_iteration': 171, 'experiment_id': '9a9218b428d04beaa5f6f318b39d3761', 'date': '2021-05-12_09-05-59', 'timestamp': 1620824759, 'time_this_iter_s': 1.1227998733520508, 'time_total_s': 214.14188718795776, 'pid': 1496467, 'hostname': 'intercostal', 'node_ip': '192.168.1.216', 'config': {'num_workers': 3, 'num_envs_per_worker': 1, 'create_env_on_driver': False, 'rollout_fragment_length': 4, 'batch_mode': 'complete_episodes', 'train_batch_size': 1280, 'model': {'_use_default_native_models': False, 'fcnet_hiddens': [32], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': True, 'use_lstm': True, 'max_seq_len': 20, 'lstm_cell_size': 64, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'num_framestacks': 'auto', 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1, 'framestack': True}, 'optimizer': {}, 'gamma': 0.997, 'horizon': 200, 'soft_horizon': False, 'no_done_at_end': False, 'env': 'StatelessCartPole', 'env_config': {}, 'render_env': False, 'record_env': False, 'normalize_actions': False, 'clip_rewards': None, 'clip_actions': True, 'preprocessor_pref': 'deepmind', 'lr': 0.0005, 'log_level': 'WARN', 'callbacks': <class '__main__.Callbacks'>, 'ignore_worker_failures': False, 'log_sys_usage': True, 'fake_sampler': False, 'framework': 'torch', 'eager_tracing': False, 'explore': True, 'exploration_config': {'type': 'EpsilonGreedy', 'initial_epsilon': 1.0, 'final_epsilon': 0.02, 'epsilon_timesteps': 100000}, 'evaluation_interval': None, 'evaluation_num_episodes': 10, 'evaluation_parallel_to_training': False, 'in_evaluation': False, 'evaluation_config': {'explore': False}, 'evaluation_num_workers': 0, 'custom_eval_function': None, 'sample_async': False, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'compress_observations': False, 'collect_metrics_timeout': 180, 'metrics_smoothing_episodes': 100, 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'min_iter_time_s': 1, 'timesteps_per_iteration': 1000, 'seed': None, 'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, '_fake_gpus': False, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, 'custom_resources_per_worker': {}, 'num_cpus_for_driver': 1, 'placement_strategy': 'PACK', 'input': 'sampler', 'input_evaluation': ['is', 'wis'], 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'multiagent': {'policies': {}, 'policy_mapping_fn': None, 'policies_to_train': None, 'observation_fn': None, 'replay_mode': 'independent', 'count_steps_by': 'env_steps'}, 'logger_config': None, 'simple_optimizer': True, 'monitor': -1, 'num_atoms': 1, 'v_min': -10.0, 'v_max': 10.0, 'noisy': False, 'sigma0': 0.5, 'dueling': False, 'hiddens': [256], 'double_q': True, 'n_step': 1, 'target_network_update_freq': 2500, 'buffer_size': 100000, 'replay_sequence_length': 40, 'prioritized_replay': False, 'prioritized_replay_alpha': 0.6, 'prioritized_replay_beta': 0.4, 'final_prioritized_replay_beta': 0.4, 'prioritized_replay_beta_annealing_timesteps': 20000, 'prioritized_replay_eps': 1e-06, 'before_learn_on_batch': None, 'training_intensity': None, 'lr_schedule': None, 'adam_epsilon': 0.001, 'grad_clip': 40, 'learning_starts': 1000, 'worker_side_prioritization': False, 'zero_init_states': True, 'burn_in': 20, 'use_h_function': True, 'h_function_epsilon': 0.001}, 'time_since_restore': 214.14188718795776, 'timesteps_since_restore': 0, 'iterations_since_restore': 171, 'perf': {'cpu_util_percent': 37.0, 'ram_util_percent': 87.8}}
{'episode_reward_max': 200.0, 'episode_reward_min': 170.0, 'episode_reward_mean': 199.45, 'episode_len_mean': 199.45, 'episode_media': {}, 'episodes_this_iter': 12, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {'default_policy': {}}, 'hist_stats': {'episode_reward': [200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 192.0, 200.0, 200.0, 200.0, 200.0, 200.0, 170.0, 193.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 190.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0], 'episode_lengths': [200, 200, 200, 200, 200, 200, 192, 200, 200, 200, 200, 200, 170, 193, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 190, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 0.06276733630089948, 'mean_inference_ms': 1.0209137937890191, 'mean_action_processing_ms': 0.042870339700403104, 'mean_env_wait_ms': 0.07893737765500464, 'mean_env_render_ms': 0.0}, 'off_policy_estimator': {}, 'num_healthy_workers': 3, 'timesteps_total': 312905, 'agent_timesteps_total': 312905, 'timers': {'learn_time_ms': 25.489, 'learn_throughput': 1255.425, 'update_time_ms': 1.616}, 'info': {'learner': {'default_policy': {'allreduce_latency': 0.0, 'grad_gnorm': 0.34125255657444326, 'cur_lr': 0.0005, 'mean_q': 6.963379859924316, 'min_q': -0.5268465876579285, 'max_q': 10.069169044494629, 'mean_td_error': 0.03860087692737579}}, 'num_steps_sampled': 312905, 'num_agent_steps_sampled': 312905, 'num_steps_trained': 56352, 'num_agent_steps_trained': 2021059, 'last_target_update_ts': 310515, 'num_target_updates': 115}, 'done': False, 'episodes_total': 5328, 'training_iteration': 172, 'experiment_id': '9a9218b428d04beaa5f6f318b39d3761', 'date': '2021-05-12_09-06-00', 'timestamp': 1620824760, 'time_this_iter_s': 1.1238863468170166, 'time_total_s': 215.26577353477478, 'pid': 1496467, 'hostname': 'intercostal', 'node_ip': '192.168.1.216', 'config': {'num_workers': 3, 'num_envs_per_worker': 1, 'create_env_on_driver': False, 'rollout_fragment_length': 4, 'batch_mode': 'complete_episodes', 'train_batch_size': 1280, 'model': {'_use_default_native_models': False, 'fcnet_hiddens': [32], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': True, 'use_lstm': True, 'max_seq_len': 20, 'lstm_cell_size': 64, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'num_framestacks': 'auto', 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1, 'framestack': True}, 'optimizer': {}, 'gamma': 0.997, 'horizon': 200, 'soft_horizon': False, 'no_done_at_end': False, 'env': 'StatelessCartPole', 'env_config': {}, 'render_env': False, 'record_env': False, 'normalize_actions': False, 'clip_rewards': None, 'clip_actions': True, 'preprocessor_pref': 'deepmind', 'lr': 0.0005, 'log_level': 'WARN', 'callbacks': <class '__main__.Callbacks'>, 'ignore_worker_failures': False, 'log_sys_usage': True, 'fake_sampler': False, 'framework': 'torch', 'eager_tracing': False, 'explore': True, 'exploration_config': {'type': 'EpsilonGreedy', 'initial_epsilon': 1.0, 'final_epsilon': 0.02, 'epsilon_timesteps': 100000}, 'evaluation_interval': None, 'evaluation_num_episodes': 10, 'evaluation_parallel_to_training': False, 'in_evaluation': False, 'evaluation_config': {'explore': False}, 'evaluation_num_workers': 0, 'custom_eval_function': None, 'sample_async': False, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'compress_observations': False, 'collect_metrics_timeout': 180, 'metrics_smoothing_episodes': 100, 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'min_iter_time_s': 1, 'timesteps_per_iteration': 1000, 'seed': None, 'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, '_fake_gpus': False, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, 'custom_resources_per_worker': {}, 'num_cpus_for_driver': 1, 'placement_strategy': 'PACK', 'input': 'sampler', 'input_evaluation': ['is', 'wis'], 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'multiagent': {'policies': {}, 'policy_mapping_fn': None, 'policies_to_train': None, 'observation_fn': None, 'replay_mode': 'independent', 'count_steps_by': 'env_steps'}, 'logger_config': None, 'simple_optimizer': True, 'monitor': -1, 'num_atoms': 1, 'v_min': -10.0, 'v_max': 10.0, 'noisy': False, 'sigma0': 0.5, 'dueling': False, 'hiddens': [256], 'double_q': True, 'n_step': 1, 'target_network_update_freq': 2500, 'buffer_size': 100000, 'replay_sequence_length': 40, 'prioritized_replay': False, 'prioritized_replay_alpha': 0.6, 'prioritized_replay_beta': 0.4, 'final_prioritized_replay_beta': 0.4, 'prioritized_replay_beta_annealing_timesteps': 20000, 'prioritized_replay_eps': 1e-06, 'before_learn_on_batch': None, 'training_intensity': None, 'lr_schedule': None, 'adam_epsilon': 0.001, 'grad_clip': 40, 'learning_starts': 1000, 'worker_side_prioritization': False, 'zero_init_states': True, 'burn_in': 20, 'use_h_function': True, 'h_function_epsilon': 0.001}, 'time_since_restore': 215.26577353477478, 'timesteps_since_restore': 0, 'iterations_since_restore': 172, 'perf': {'cpu_util_percent': 38.2, 'ram_util_percent': 87.7}}
{'episode_reward_max': 200.0, 'episode_reward_min': 170.0, 'episode_reward_mean': 199.53, 'episode_len_mean': 199.53, 'episode_media': {}, 'episodes_this_iter': 12, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {'default_policy': {}}, 'hist_stats': {'episode_reward': [170.0, 193.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 190.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0], 'episode_lengths': [170, 193, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 190, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 0.06271810841313205, 'mean_inference_ms': 1.0206429636927172, 'mean_action_processing_ms': 0.0428612767292952, 'mean_env_wait_ms': 0.07892013414581119, 'mean_env_render_ms': 0.0}, 'off_policy_estimator': {}, 'num_healthy_workers': 3, 'timesteps_total': 315305, 'agent_timesteps_total': 315305, 'timers': {'learn_time_ms': 25.404, 'learn_throughput': 1259.668, 'update_time_ms': 1.683}, 'info': {'learner': {'default_policy': {'allreduce_latency': 0.0, 'grad_gnorm': 0.21955293016168062, 'cur_lr': 0.0005, 'mean_q': 7.147408485412598, 'min_q': -0.9092021584510803, 'max_q': 9.924005508422852, 'mean_td_error': -0.017582902684807777}}, 'num_steps_sampled': 315305, 'num_agent_steps_sampled': 315305, 'num_steps_trained': 56480, 'num_agent_steps_trained': 2026114, 'last_target_update_ts': 313505, 'num_target_updates': 116}, 'done': False, 'episodes_total': 5340, 'training_iteration': 173, 'experiment_id': '9a9218b428d04beaa5f6f318b39d3761', 'date': '2021-05-12_09-06-01', 'timestamp': 1620824761, 'time_this_iter_s': 1.113065481185913, 'time_total_s': 216.3788390159607, 'pid': 1496467, 'hostname': 'intercostal', 'node_ip': '192.168.1.216', 'config': {'num_workers': 3, 'num_envs_per_worker': 1, 'create_env_on_driver': False, 'rollout_fragment_length': 4, 'batch_mode': 'complete_episodes', 'train_batch_size': 1280, 'model': {'_use_default_native_models': False, 'fcnet_hiddens': [32], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': True, 'use_lstm': True, 'max_seq_len': 20, 'lstm_cell_size': 64, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'num_framestacks': 'auto', 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1, 'framestack': True}, 'optimizer': {}, 'gamma': 0.997, 'horizon': 200, 'soft_horizon': False, 'no_done_at_end': False, 'env': 'StatelessCartPole', 'env_config': {}, 'render_env': False, 'record_env': False, 'normalize_actions': False, 'clip_rewards': None, 'clip_actions': True, 'preprocessor_pref': 'deepmind', 'lr': 0.0005, 'log_level': 'WARN', 'callbacks': <class '__main__.Callbacks'>, 'ignore_worker_failures': False, 'log_sys_usage': True, 'fake_sampler': False, 'framework': 'torch', 'eager_tracing': False, 'explore': True, 'exploration_config': {'type': 'EpsilonGreedy', 'initial_epsilon': 1.0, 'final_epsilon': 0.02, 'epsilon_timesteps': 100000}, 'evaluation_interval': None, 'evaluation_num_episodes': 10, 'evaluation_parallel_to_training': False, 'in_evaluation': False, 'evaluation_config': {'explore': False}, 'evaluation_num_workers': 0, 'custom_eval_function': None, 'sample_async': False, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'compress_observations': False, 'collect_metrics_timeout': 180, 'metrics_smoothing_episodes': 100, 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'min_iter_time_s': 1, 'timesteps_per_iteration': 1000, 'seed': None, 'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, '_fake_gpus': False, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, 'custom_resources_per_worker': {}, 'num_cpus_for_driver': 1, 'placement_strategy': 'PACK', 'input': 'sampler', 'input_evaluation': ['is', 'wis'], 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'multiagent': {'policies': {}, 'policy_mapping_fn': None, 'policies_to_train': None, 'observation_fn': None, 'replay_mode': 'independent', 'count_steps_by': 'env_steps'}, 'logger_config': None, 'simple_optimizer': True, 'monitor': -1, 'num_atoms': 1, 'v_min': -10.0, 'v_max': 10.0, 'noisy': False, 'sigma0': 0.5, 'dueling': False, 'hiddens': [256], 'double_q': True, 'n_step': 1, 'target_network_update_freq': 2500, 'buffer_size': 100000, 'replay_sequence_length': 40, 'prioritized_replay': False, 'prioritized_replay_alpha': 0.6, 'prioritized_replay_beta': 0.4, 'final_prioritized_replay_beta': 0.4, 'prioritized_replay_beta_annealing_timesteps': 20000, 'prioritized_replay_eps': 1e-06, 'before_learn_on_batch': None, 'training_intensity': None, 'lr_schedule': None, 'adam_epsilon': 0.001, 'grad_clip': 40, 'learning_starts': 1000, 'worker_side_prioritization': False, 'zero_init_states': True, 'burn_in': 20, 'use_h_function': True, 'h_function_epsilon': 0.001}, 'time_since_restore': 216.3788390159607, 'timesteps_since_restore': 0, 'iterations_since_restore': 173, 'perf': {'cpu_util_percent': 38.2, 'ram_util_percent': 87.7}}
{'episode_reward_max': 200.0, 'episode_reward_min': 190.0, 'episode_reward_mean': 199.9, 'episode_len_mean': 199.9, 'episode_media': {}, 'episodes_this_iter': 12, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {'default_policy': {}}, 'hist_stats': {'episode_reward': [200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 190.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0], 'episode_lengths': [200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 190, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 0.06266814528141568, 'mean_inference_ms': 1.0203411595940832, 'mean_action_processing_ms': 0.04285049762309565, 'mean_env_wait_ms': 0.07889981698397208, 'mean_env_render_ms': 0.0}, 'off_policy_estimator': {}, 'num_healthy_workers': 3, 'timesteps_total': 317705, 'agent_timesteps_total': 317705, 'timers': {'learn_time_ms': 25.709, 'learn_throughput': 1244.714, 'update_time_ms': 1.675}, 'info': {'learner': {'default_policy': {'allreduce_latency': 0.0, 'grad_gnorm': 0.37422065666161136, 'cur_lr': 0.0005, 'mean_q': 6.92258882522583, 'min_q': -0.13285666704177856, 'max_q': 10.073381423950195, 'mean_td_error': -0.042649589478969574}}, 'num_steps_sampled': 317705, 'num_agent_steps_sampled': 317705, 'num_steps_trained': 56608, 'num_agent_steps_trained': 2031168, 'last_target_update_ts': 316505, 'num_target_updates': 117}, 'done': False, 'episodes_total': 5352, 'training_iteration': 174, 'experiment_id': '9a9218b428d04beaa5f6f318b39d3761', 'date': '2021-05-12_09-06-03', 'timestamp': 1620824763, 'time_this_iter_s': 1.074005126953125, 'time_total_s': 217.45284414291382, 'pid': 1496467, 'hostname': 'intercostal', 'node_ip': '192.168.1.216', 'config': {'num_workers': 3, 'num_envs_per_worker': 1, 'create_env_on_driver': False, 'rollout_fragment_length': 4, 'batch_mode': 'complete_episodes', 'train_batch_size': 1280, 'model': {'_use_default_native_models': False, 'fcnet_hiddens': [32], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': True, 'use_lstm': True, 'max_seq_len': 20, 'lstm_cell_size': 64, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'num_framestacks': 'auto', 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1, 'framestack': True}, 'optimizer': {}, 'gamma': 0.997, 'horizon': 200, 'soft_horizon': False, 'no_done_at_end': False, 'env': 'StatelessCartPole', 'env_config': {}, 'render_env': False, 'record_env': False, 'normalize_actions': False, 'clip_rewards': None, 'clip_actions': True, 'preprocessor_pref': 'deepmind', 'lr': 0.0005, 'log_level': 'WARN', 'callbacks': <class '__main__.Callbacks'>, 'ignore_worker_failures': False, 'log_sys_usage': True, 'fake_sampler': False, 'framework': 'torch', 'eager_tracing': False, 'explore': True, 'exploration_config': {'type': 'EpsilonGreedy', 'initial_epsilon': 1.0, 'final_epsilon': 0.02, 'epsilon_timesteps': 100000}, 'evaluation_interval': None, 'evaluation_num_episodes': 10, 'evaluation_parallel_to_training': False, 'in_evaluation': False, 'evaluation_config': {'explore': False}, 'evaluation_num_workers': 0, 'custom_eval_function': None, 'sample_async': False, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'compress_observations': False, 'collect_metrics_timeout': 180, 'metrics_smoothing_episodes': 100, 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'min_iter_time_s': 1, 'timesteps_per_iteration': 1000, 'seed': None, 'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, '_fake_gpus': False, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, 'custom_resources_per_worker': {}, 'num_cpus_for_driver': 1, 'placement_strategy': 'PACK', 'input': 'sampler', 'input_evaluation': ['is', 'wis'], 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'multiagent': {'policies': {}, 'policy_mapping_fn': None, 'policies_to_train': None, 'observation_fn': None, 'replay_mode': 'independent', 'count_steps_by': 'env_steps'}, 'logger_config': None, 'simple_optimizer': True, 'monitor': -1, 'num_atoms': 1, 'v_min': -10.0, 'v_max': 10.0, 'noisy': False, 'sigma0': 0.5, 'dueling': False, 'hiddens': [256], 'double_q': True, 'n_step': 1, 'target_network_update_freq': 2500, 'buffer_size': 100000, 'replay_sequence_length': 40, 'prioritized_replay': False, 'prioritized_replay_alpha': 0.6, 'prioritized_replay_beta': 0.4, 'final_prioritized_replay_beta': 0.4, 'prioritized_replay_beta_annealing_timesteps': 20000, 'prioritized_replay_eps': 1e-06, 'before_learn_on_batch': None, 'training_intensity': None, 'lr_schedule': None, 'adam_epsilon': 0.001, 'grad_clip': 40, 'learning_starts': 1000, 'worker_side_prioritization': False, 'zero_init_states': True, 'burn_in': 20, 'use_h_function': True, 'h_function_epsilon': 0.001}, 'time_since_restore': 217.45284414291382, 'timesteps_since_restore': 0, 'iterations_since_restore': 174, 'perf': {'cpu_util_percent': 38.3, 'ram_util_percent': 87.7}}
{'episode_reward_max': 200.0, 'episode_reward_min': 190.0, 'episode_reward_mean': 199.9, 'episode_len_mean': 199.9, 'episode_media': {}, 'episodes_this_iter': 12, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {'default_policy': {}}, 'hist_stats': {'episode_reward': [200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 190.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0], 'episode_lengths': [200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 190, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 0.06262104052091402, 'mean_inference_ms': 1.020075957918919, 'mean_action_processing_ms': 0.042841427138241246, 'mean_env_wait_ms': 0.0788824312415027, 'mean_env_render_ms': 0.0}, 'off_policy_estimator': {}, 'num_healthy_workers': 3, 'timesteps_total': 320105, 'agent_timesteps_total': 320105, 'timers': {'learn_time_ms': 26.0, 'learn_throughput': 1230.768, 'update_time_ms': 1.687}, 'info': {'learner': {'default_policy': {'allreduce_latency': 0.0, 'grad_gnorm': 0.2813318723431594, 'cur_lr': 0.0005, 'mean_q': 7.188282012939453, 'min_q': -0.13043101131916046, 'max_q': 10.191689491271973, 'mean_td_error': 0.003697199746966362}}, 'num_steps_sampled': 320105, 'num_agent_steps_sampled': 320105, 'num_steps_trained': 56736, 'num_agent_steps_trained': 2036226, 'last_target_update_ts': 319505, 'num_target_updates': 118}, 'done': False, 'episodes_total': 5364, 'training_iteration': 175, 'experiment_id': '9a9218b428d04beaa5f6f318b39d3761', 'date': '2021-05-12_09-06-04', 'timestamp': 1620824764, 'time_this_iter_s': 1.2007229328155518, 'time_total_s': 218.65356707572937, 'pid': 1496467, 'hostname': 'intercostal', 'node_ip': '192.168.1.216', 'config': {'num_workers': 3, 'num_envs_per_worker': 1, 'create_env_on_driver': False, 'rollout_fragment_length': 4, 'batch_mode': 'complete_episodes', 'train_batch_size': 1280, 'model': {'_use_default_native_models': False, 'fcnet_hiddens': [32], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': True, 'use_lstm': True, 'max_seq_len': 20, 'lstm_cell_size': 64, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'num_framestacks': 'auto', 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1, 'framestack': True}, 'optimizer': {}, 'gamma': 0.997, 'horizon': 200, 'soft_horizon': False, 'no_done_at_end': False, 'env': 'StatelessCartPole', 'env_config': {}, 'render_env': False, 'record_env': False, 'normalize_actions': False, 'clip_rewards': None, 'clip_actions': True, 'preprocessor_pref': 'deepmind', 'lr': 0.0005, 'log_level': 'WARN', 'callbacks': <class '__main__.Callbacks'>, 'ignore_worker_failures': False, 'log_sys_usage': True, 'fake_sampler': False, 'framework': 'torch', 'eager_tracing': False, 'explore': True, 'exploration_config': {'type': 'EpsilonGreedy', 'initial_epsilon': 1.0, 'final_epsilon': 0.02, 'epsilon_timesteps': 100000}, 'evaluation_interval': None, 'evaluation_num_episodes': 10, 'evaluation_parallel_to_training': False, 'in_evaluation': False, 'evaluation_config': {'explore': False}, 'evaluation_num_workers': 0, 'custom_eval_function': None, 'sample_async': False, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'compress_observations': False, 'collect_metrics_timeout': 180, 'metrics_smoothing_episodes': 100, 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'min_iter_time_s': 1, 'timesteps_per_iteration': 1000, 'seed': None, 'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, '_fake_gpus': False, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, 'custom_resources_per_worker': {}, 'num_cpus_for_driver': 1, 'placement_strategy': 'PACK', 'input': 'sampler', 'input_evaluation': ['is', 'wis'], 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'multiagent': {'policies': {}, 'policy_mapping_fn': None, 'policies_to_train': None, 'observation_fn': None, 'replay_mode': 'independent', 'count_steps_by': 'env_steps'}, 'logger_config': None, 'simple_optimizer': True, 'monitor': -1, 'num_atoms': 1, 'v_min': -10.0, 'v_max': 10.0, 'noisy': False, 'sigma0': 0.5, 'dueling': False, 'hiddens': [256], 'double_q': True, 'n_step': 1, 'target_network_update_freq': 2500, 'buffer_size': 100000, 'replay_sequence_length': 40, 'prioritized_replay': False, 'prioritized_replay_alpha': 0.6, 'prioritized_replay_beta': 0.4, 'final_prioritized_replay_beta': 0.4, 'prioritized_replay_beta_annealing_timesteps': 20000, 'prioritized_replay_eps': 1e-06, 'before_learn_on_batch': None, 'training_intensity': None, 'lr_schedule': None, 'adam_epsilon': 0.001, 'grad_clip': 40, 'learning_starts': 1000, 'worker_side_prioritization': False, 'zero_init_states': True, 'burn_in': 20, 'use_h_function': True, 'h_function_epsilon': 0.001}, 'time_since_restore': 218.65356707572937, 'timesteps_since_restore': 0, 'iterations_since_restore': 175, 'perf': {'cpu_util_percent': 39.35, 'ram_util_percent': 87.8}}
{'episode_reward_max': 200.0, 'episode_reward_min': 190.0, 'episode_reward_mean': 199.9, 'episode_len_mean': 199.9, 'episode_media': {}, 'episodes_this_iter': 12, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {'default_policy': {}}, 'hist_stats': {'episode_reward': [200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 190.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0], 'episode_lengths': [200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 190, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 0.06257304882083314, 'mean_inference_ms': 1.019794830274472, 'mean_action_processing_ms': 0.04283151103675581, 'mean_env_wait_ms': 0.0788628220245302, 'mean_env_render_ms': 0.0}, 'off_policy_estimator': {}, 'num_healthy_workers': 3, 'timesteps_total': 322505, 'agent_timesteps_total': 322505, 'timers': {'learn_time_ms': 26.182, 'learn_throughput': 1222.231, 'update_time_ms': 1.708}, 'info': {'learner': {'default_policy': {'allreduce_latency': 0.0, 'grad_gnorm': 0.3615074530296873, 'cur_lr': 0.0005, 'mean_q': 7.7929277420043945, 'min_q': -0.17142601311206818, 'max_q': 10.296675682067871, 'mean_td_error': 0.05277141556143761}}, 'num_steps_sampled': 322505, 'num_agent_steps_sampled': 322505, 'num_steps_trained': 56864, 'num_agent_steps_trained': 2041339, 'last_target_update_ts': 322505, 'num_target_updates': 119}, 'done': False, 'episodes_total': 5376, 'training_iteration': 176, 'experiment_id': '9a9218b428d04beaa5f6f318b39d3761', 'date': '2021-05-12_09-06-05', 'timestamp': 1620824765, 'time_this_iter_s': 1.0855655670166016, 'time_total_s': 219.73913264274597, 'pid': 1496467, 'hostname': 'intercostal', 'node_ip': '192.168.1.216', 'config': {'num_workers': 3, 'num_envs_per_worker': 1, 'create_env_on_driver': False, 'rollout_fragment_length': 4, 'batch_mode': 'complete_episodes', 'train_batch_size': 1280, 'model': {'_use_default_native_models': False, 'fcnet_hiddens': [32], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': True, 'use_lstm': True, 'max_seq_len': 20, 'lstm_cell_size': 64, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'num_framestacks': 'auto', 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1, 'framestack': True}, 'optimizer': {}, 'gamma': 0.997, 'horizon': 200, 'soft_horizon': False, 'no_done_at_end': False, 'env': 'StatelessCartPole', 'env_config': {}, 'render_env': False, 'record_env': False, 'normalize_actions': False, 'clip_rewards': None, 'clip_actions': True, 'preprocessor_pref': 'deepmind', 'lr': 0.0005, 'log_level': 'WARN', 'callbacks': <class '__main__.Callbacks'>, 'ignore_worker_failures': False, 'log_sys_usage': True, 'fake_sampler': False, 'framework': 'torch', 'eager_tracing': False, 'explore': True, 'exploration_config': {'type': 'EpsilonGreedy', 'initial_epsilon': 1.0, 'final_epsilon': 0.02, 'epsilon_timesteps': 100000}, 'evaluation_interval': None, 'evaluation_num_episodes': 10, 'evaluation_parallel_to_training': False, 'in_evaluation': False, 'evaluation_config': {'explore': False}, 'evaluation_num_workers': 0, 'custom_eval_function': None, 'sample_async': False, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'compress_observations': False, 'collect_metrics_timeout': 180, 'metrics_smoothing_episodes': 100, 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'min_iter_time_s': 1, 'timesteps_per_iteration': 1000, 'seed': None, 'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, '_fake_gpus': False, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, 'custom_resources_per_worker': {}, 'num_cpus_for_driver': 1, 'placement_strategy': 'PACK', 'input': 'sampler', 'input_evaluation': ['is', 'wis'], 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'multiagent': {'policies': {}, 'policy_mapping_fn': None, 'policies_to_train': None, 'observation_fn': None, 'replay_mode': 'independent', 'count_steps_by': 'env_steps'}, 'logger_config': None, 'simple_optimizer': True, 'monitor': -1, 'num_atoms': 1, 'v_min': -10.0, 'v_max': 10.0, 'noisy': False, 'sigma0': 0.5, 'dueling': False, 'hiddens': [256], 'double_q': True, 'n_step': 1, 'target_network_update_freq': 2500, 'buffer_size': 100000, 'replay_sequence_length': 40, 'prioritized_replay': False, 'prioritized_replay_alpha': 0.6, 'prioritized_replay_beta': 0.4, 'final_prioritized_replay_beta': 0.4, 'prioritized_replay_beta_annealing_timesteps': 20000, 'prioritized_replay_eps': 1e-06, 'before_learn_on_batch': None, 'training_intensity': None, 'lr_schedule': None, 'adam_epsilon': 0.001, 'grad_clip': 40, 'learning_starts': 1000, 'worker_side_prioritization': False, 'zero_init_states': True, 'burn_in': 20, 'use_h_function': True, 'h_function_epsilon': 0.001}, 'time_since_restore': 219.73913264274597, 'timesteps_since_restore': 0, 'iterations_since_restore': 176, 'perf': {'cpu_util_percent': 36.5, 'ram_util_percent': 87.7}}
{'episode_reward_max': 200.0, 'episode_reward_min': 190.0, 'episode_reward_mean': 199.9, 'episode_len_mean': 199.9, 'episode_media': {}, 'episodes_this_iter': 12, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {'default_policy': {}}, 'hist_stats': {'episode_reward': [200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 190.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0], 'episode_lengths': [200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 190, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 0.06252651714212196, 'mean_inference_ms': 1.019532054097064, 'mean_action_processing_ms': 0.0428223020803995, 'mean_env_wait_ms': 0.07884424296269503, 'mean_env_render_ms': 0.0}, 'off_policy_estimator': {}, 'num_healthy_workers': 3, 'timesteps_total': 324905, 'agent_timesteps_total': 324905, 'timers': {'learn_time_ms': 25.915, 'learn_throughput': 1234.8, 'update_time_ms': 1.745}, 'info': {'learner': {'default_policy': {'allreduce_latency': 0.0, 'grad_gnorm': 0.38570513598510225, 'cur_lr': 0.0005, 'mean_q': 7.848695278167725, 'min_q': 0.25453951954841614, 'max_q': 10.175018310546875, 'mean_td_error': 0.042338136583566666}}, 'num_steps_sampled': 324905, 'num_agent_steps_sampled': 324905, 'num_steps_trained': 56992, 'num_agent_steps_trained': 2046446, 'last_target_update_ts': 322505, 'num_target_updates': 119}, 'done': False, 'episodes_total': 5388, 'training_iteration': 177, 'experiment_id': '9a9218b428d04beaa5f6f318b39d3761', 'date': '2021-05-12_09-06-06', 'timestamp': 1620824766, 'time_this_iter_s': 1.150735855102539, 'time_total_s': 220.8898684978485, 'pid': 1496467, 'hostname': 'intercostal', 'node_ip': '192.168.1.216', 'config': {'num_workers': 3, 'num_envs_per_worker': 1, 'create_env_on_driver': False, 'rollout_fragment_length': 4, 'batch_mode': 'complete_episodes', 'train_batch_size': 1280, 'model': {'_use_default_native_models': False, 'fcnet_hiddens': [32], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': True, 'use_lstm': True, 'max_seq_len': 20, 'lstm_cell_size': 64, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'num_framestacks': 'auto', 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1, 'framestack': True}, 'optimizer': {}, 'gamma': 0.997, 'horizon': 200, 'soft_horizon': False, 'no_done_at_end': False, 'env': 'StatelessCartPole', 'env_config': {}, 'render_env': False, 'record_env': False, 'normalize_actions': False, 'clip_rewards': None, 'clip_actions': True, 'preprocessor_pref': 'deepmind', 'lr': 0.0005, 'log_level': 'WARN', 'callbacks': <class '__main__.Callbacks'>, 'ignore_worker_failures': False, 'log_sys_usage': True, 'fake_sampler': False, 'framework': 'torch', 'eager_tracing': False, 'explore': True, 'exploration_config': {'type': 'EpsilonGreedy', 'initial_epsilon': 1.0, 'final_epsilon': 0.02, 'epsilon_timesteps': 100000}, 'evaluation_interval': None, 'evaluation_num_episodes': 10, 'evaluation_parallel_to_training': False, 'in_evaluation': False, 'evaluation_config': {'explore': False}, 'evaluation_num_workers': 0, 'custom_eval_function': None, 'sample_async': False, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'compress_observations': False, 'collect_metrics_timeout': 180, 'metrics_smoothing_episodes': 100, 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'min_iter_time_s': 1, 'timesteps_per_iteration': 1000, 'seed': None, 'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, '_fake_gpus': False, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, 'custom_resources_per_worker': {}, 'num_cpus_for_driver': 1, 'placement_strategy': 'PACK', 'input': 'sampler', 'input_evaluation': ['is', 'wis'], 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'multiagent': {'policies': {}, 'policy_mapping_fn': None, 'policies_to_train': None, 'observation_fn': None, 'replay_mode': 'independent', 'count_steps_by': 'env_steps'}, 'logger_config': None, 'simple_optimizer': True, 'monitor': -1, 'num_atoms': 1, 'v_min': -10.0, 'v_max': 10.0, 'noisy': False, 'sigma0': 0.5, 'dueling': False, 'hiddens': [256], 'double_q': True, 'n_step': 1, 'target_network_update_freq': 2500, 'buffer_size': 100000, 'replay_sequence_length': 40, 'prioritized_replay': False, 'prioritized_replay_alpha': 0.6, 'prioritized_replay_beta': 0.4, 'final_prioritized_replay_beta': 0.4, 'prioritized_replay_beta_annealing_timesteps': 20000, 'prioritized_replay_eps': 1e-06, 'before_learn_on_batch': None, 'training_intensity': None, 'lr_schedule': None, 'adam_epsilon': 0.001, 'grad_clip': 40, 'learning_starts': 1000, 'worker_side_prioritization': False, 'zero_init_states': True, 'burn_in': 20, 'use_h_function': True, 'h_function_epsilon': 0.001}, 'time_since_restore': 220.8898684978485, 'timesteps_since_restore': 0, 'iterations_since_restore': 177, 'perf': {'cpu_util_percent': 38.25, 'ram_util_percent': 87.7}}
{'episode_reward_max': 200.0, 'episode_reward_min': 190.0, 'episode_reward_mean': 199.9, 'episode_len_mean': 199.9, 'episode_media': {}, 'episodes_this_iter': 12, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {'default_policy': {}}, 'hist_stats': {'episode_reward': [200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 190.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0], 'episode_lengths': [200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 190, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 0.062479874185533066, 'mean_inference_ms': 1.0192539721389446, 'mean_action_processing_ms': 0.042812479534710395, 'mean_env_wait_ms': 0.07882436624227009, 'mean_env_render_ms': 0.0}, 'off_policy_estimator': {}, 'num_healthy_workers': 3, 'timesteps_total': 327305, 'agent_timesteps_total': 327305, 'timers': {'learn_time_ms': 25.76, 'learn_throughput': 1242.228, 'update_time_ms': 1.705}, 'info': {'learner': {'default_policy': {'allreduce_latency': 0.0, 'grad_gnorm': 0.0903826843289689, 'cur_lr': 0.0005, 'mean_q': 7.412255764007568, 'min_q': -0.3941640853881836, 'max_q': 10.086545944213867, 'mean_td_error': 0.007833817973732948}}, 'num_steps_sampled': 327305, 'num_agent_steps_sampled': 327305, 'num_steps_trained': 57120, 'num_agent_steps_trained': 2051558, 'last_target_update_ts': 325505, 'num_target_updates': 120}, 'done': False, 'episodes_total': 5400, 'training_iteration': 178, 'experiment_id': '9a9218b428d04beaa5f6f318b39d3761', 'date': '2021-05-12_09-06-07', 'timestamp': 1620824767, 'time_this_iter_s': 1.0711538791656494, 'time_total_s': 221.96102237701416, 'pid': 1496467, 'hostname': 'intercostal', 'node_ip': '192.168.1.216', 'config': {'num_workers': 3, 'num_envs_per_worker': 1, 'create_env_on_driver': False, 'rollout_fragment_length': 4, 'batch_mode': 'complete_episodes', 'train_batch_size': 1280, 'model': {'_use_default_native_models': False, 'fcnet_hiddens': [32], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': True, 'use_lstm': True, 'max_seq_len': 20, 'lstm_cell_size': 64, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'num_framestacks': 'auto', 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1, 'framestack': True}, 'optimizer': {}, 'gamma': 0.997, 'horizon': 200, 'soft_horizon': False, 'no_done_at_end': False, 'env': 'StatelessCartPole', 'env_config': {}, 'render_env': False, 'record_env': False, 'normalize_actions': False, 'clip_rewards': None, 'clip_actions': True, 'preprocessor_pref': 'deepmind', 'lr': 0.0005, 'log_level': 'WARN', 'callbacks': <class '__main__.Callbacks'>, 'ignore_worker_failures': False, 'log_sys_usage': True, 'fake_sampler': False, 'framework': 'torch', 'eager_tracing': False, 'explore': True, 'exploration_config': {'type': 'EpsilonGreedy', 'initial_epsilon': 1.0, 'final_epsilon': 0.02, 'epsilon_timesteps': 100000}, 'evaluation_interval': None, 'evaluation_num_episodes': 10, 'evaluation_parallel_to_training': False, 'in_evaluation': False, 'evaluation_config': {'explore': False}, 'evaluation_num_workers': 0, 'custom_eval_function': None, 'sample_async': False, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'compress_observations': False, 'collect_metrics_timeout': 180, 'metrics_smoothing_episodes': 100, 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'min_iter_time_s': 1, 'timesteps_per_iteration': 1000, 'seed': None, 'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, '_fake_gpus': False, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, 'custom_resources_per_worker': {}, 'num_cpus_for_driver': 1, 'placement_strategy': 'PACK', 'input': 'sampler', 'input_evaluation': ['is', 'wis'], 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'multiagent': {'policies': {}, 'policy_mapping_fn': None, 'policies_to_train': None, 'observation_fn': None, 'replay_mode': 'independent', 'count_steps_by': 'env_steps'}, 'logger_config': None, 'simple_optimizer': True, 'monitor': -1, 'num_atoms': 1, 'v_min': -10.0, 'v_max': 10.0, 'noisy': False, 'sigma0': 0.5, 'dueling': False, 'hiddens': [256], 'double_q': True, 'n_step': 1, 'target_network_update_freq': 2500, 'buffer_size': 100000, 'replay_sequence_length': 40, 'prioritized_replay': False, 'prioritized_replay_alpha': 0.6, 'prioritized_replay_beta': 0.4, 'final_prioritized_replay_beta': 0.4, 'prioritized_replay_beta_annealing_timesteps': 20000, 'prioritized_replay_eps': 1e-06, 'before_learn_on_batch': None, 'training_intensity': None, 'lr_schedule': None, 'adam_epsilon': 0.001, 'grad_clip': 40, 'learning_starts': 1000, 'worker_side_prioritization': False, 'zero_init_states': True, 'burn_in': 20, 'use_h_function': True, 'h_function_epsilon': 0.001}, 'time_since_restore': 221.96102237701416, 'timesteps_since_restore': 0, 'iterations_since_restore': 178, 'perf': {'cpu_util_percent': 39.5, 'ram_util_percent': 87.7}}
{'episode_reward_max': 200.0, 'episode_reward_min': 190.0, 'episode_reward_mean': 199.9, 'episode_len_mean': 199.9, 'episode_media': {}, 'episodes_this_iter': 12, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {'default_policy': {}}, 'hist_stats': {'episode_reward': [200.0, 200.0, 200.0, 200.0, 190.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0], 'episode_lengths': [200, 200, 200, 200, 190, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 0.06243414613037514, 'mean_inference_ms': 1.0189611805105032, 'mean_action_processing_ms': 0.04280207902663839, 'mean_env_wait_ms': 0.0788045598189036, 'mean_env_render_ms': 0.0}, 'off_policy_estimator': {}, 'num_healthy_workers': 3, 'timesteps_total': 329705, 'agent_timesteps_total': 329705, 'timers': {'learn_time_ms': 26.139, 'learn_throughput': 1224.245, 'update_time_ms': 1.625}, 'info': {'learner': {'default_policy': {'allreduce_latency': 0.0, 'grad_gnorm': 0.37471600314348835, 'cur_lr': 0.0005, 'mean_q': 7.61482572555542, 'min_q': 0.3935427963733673, 'max_q': 9.71854019165039, 'mean_td_error': -0.04087908938527107}}, 'num_steps_sampled': 329705, 'num_agent_steps_sampled': 329705, 'num_steps_trained': 57248, 'num_agent_steps_trained': 2056678, 'last_target_update_ts': 328505, 'num_target_updates': 121}, 'done': False, 'episodes_total': 5412, 'training_iteration': 179, 'experiment_id': '9a9218b428d04beaa5f6f318b39d3761', 'date': '2021-05-12_09-06-08', 'timestamp': 1620824768, 'time_this_iter_s': 1.0958092212677002, 'time_total_s': 223.05683159828186, 'pid': 1496467, 'hostname': 'intercostal', 'node_ip': '192.168.1.216', 'config': {'num_workers': 3, 'num_envs_per_worker': 1, 'create_env_on_driver': False, 'rollout_fragment_length': 4, 'batch_mode': 'complete_episodes', 'train_batch_size': 1280, 'model': {'_use_default_native_models': False, 'fcnet_hiddens': [32], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': True, 'use_lstm': True, 'max_seq_len': 20, 'lstm_cell_size': 64, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'num_framestacks': 'auto', 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1, 'framestack': True}, 'optimizer': {}, 'gamma': 0.997, 'horizon': 200, 'soft_horizon': False, 'no_done_at_end': False, 'env': 'StatelessCartPole', 'env_config': {}, 'render_env': False, 'record_env': False, 'normalize_actions': False, 'clip_rewards': None, 'clip_actions': True, 'preprocessor_pref': 'deepmind', 'lr': 0.0005, 'log_level': 'WARN', 'callbacks': <class '__main__.Callbacks'>, 'ignore_worker_failures': False, 'log_sys_usage': True, 'fake_sampler': False, 'framework': 'torch', 'eager_tracing': False, 'explore': True, 'exploration_config': {'type': 'EpsilonGreedy', 'initial_epsilon': 1.0, 'final_epsilon': 0.02, 'epsilon_timesteps': 100000}, 'evaluation_interval': None, 'evaluation_num_episodes': 10, 'evaluation_parallel_to_training': False, 'in_evaluation': False, 'evaluation_config': {'explore': False}, 'evaluation_num_workers': 0, 'custom_eval_function': None, 'sample_async': False, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'compress_observations': False, 'collect_metrics_timeout': 180, 'metrics_smoothing_episodes': 100, 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'min_iter_time_s': 1, 'timesteps_per_iteration': 1000, 'seed': None, 'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, '_fake_gpus': False, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, 'custom_resources_per_worker': {}, 'num_cpus_for_driver': 1, 'placement_strategy': 'PACK', 'input': 'sampler', 'input_evaluation': ['is', 'wis'], 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'multiagent': {'policies': {}, 'policy_mapping_fn': None, 'policies_to_train': None, 'observation_fn': None, 'replay_mode': 'independent', 'count_steps_by': 'env_steps'}, 'logger_config': None, 'simple_optimizer': True, 'monitor': -1, 'num_atoms': 1, 'v_min': -10.0, 'v_max': 10.0, 'noisy': False, 'sigma0': 0.5, 'dueling': False, 'hiddens': [256], 'double_q': True, 'n_step': 1, 'target_network_update_freq': 2500, 'buffer_size': 100000, 'replay_sequence_length': 40, 'prioritized_replay': False, 'prioritized_replay_alpha': 0.6, 'prioritized_replay_beta': 0.4, 'final_prioritized_replay_beta': 0.4, 'prioritized_replay_beta_annealing_timesteps': 20000, 'prioritized_replay_eps': 1e-06, 'before_learn_on_batch': None, 'training_intensity': None, 'lr_schedule': None, 'adam_epsilon': 0.001, 'grad_clip': 40, 'learning_starts': 1000, 'worker_side_prioritization': False, 'zero_init_states': True, 'burn_in': 20, 'use_h_function': True, 'h_function_epsilon': 0.001}, 'time_since_restore': 223.05683159828186, 'timesteps_since_restore': 0, 'iterations_since_restore': 179, 'perf': {'cpu_util_percent': 38.75, 'ram_util_percent': 87.7}}
{'episode_reward_max': 200.0, 'episode_reward_min': 200.0, 'episode_reward_mean': 200.0, 'episode_len_mean': 200.0, 'episode_media': {}, 'episodes_this_iter': 12, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {'default_policy': {}}, 'hist_stats': {'episode_reward': [200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0], 'episode_lengths': [200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 0.062389638677530924, 'mean_inference_ms': 1.0186756355524342, 'mean_action_processing_ms': 0.04279229577754453, 'mean_env_wait_ms': 0.07878638092914074, 'mean_env_render_ms': 0.0}, 'off_policy_estimator': {}, 'num_healthy_workers': 3, 'timesteps_total': 332105, 'agent_timesteps_total': 332105, 'timers': {'learn_time_ms': 26.455, 'learn_throughput': 1209.601, 'update_time_ms': 1.636}, 'info': {'learner': {'default_policy': {'allreduce_latency': 0.0, 'grad_gnorm': 0.48613422447969534, 'cur_lr': 0.0005, 'mean_q': 7.511012077331543, 'min_q': -0.8970309495925903, 'max_q': 10.279221534729004, 'mean_td_error': -0.054695166647434235}}, 'num_steps_sampled': 332105, 'num_agent_steps_sampled': 332105, 'num_steps_trained': 57376, 'num_agent_steps_trained': 2061752, 'last_target_update_ts': 331505, 'num_target_updates': 122}, 'done': False, 'episodes_total': 5424, 'training_iteration': 180, 'experiment_id': '9a9218b428d04beaa5f6f318b39d3761', 'date': '2021-05-12_09-06-09', 'timestamp': 1620824769, 'time_this_iter_s': 1.1090872287750244, 'time_total_s': 224.16591882705688, 'pid': 1496467, 'hostname': 'intercostal', 'node_ip': '192.168.1.216', 'config': {'num_workers': 3, 'num_envs_per_worker': 1, 'create_env_on_driver': False, 'rollout_fragment_length': 4, 'batch_mode': 'complete_episodes', 'train_batch_size': 1280, 'model': {'_use_default_native_models': False, 'fcnet_hiddens': [32], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': True, 'use_lstm': True, 'max_seq_len': 20, 'lstm_cell_size': 64, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'num_framestacks': 'auto', 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1, 'framestack': True}, 'optimizer': {}, 'gamma': 0.997, 'horizon': 200, 'soft_horizon': False, 'no_done_at_end': False, 'env': 'StatelessCartPole', 'env_config': {}, 'render_env': False, 'record_env': False, 'normalize_actions': False, 'clip_rewards': None, 'clip_actions': True, 'preprocessor_pref': 'deepmind', 'lr': 0.0005, 'log_level': 'WARN', 'callbacks': <class '__main__.Callbacks'>, 'ignore_worker_failures': False, 'log_sys_usage': True, 'fake_sampler': False, 'framework': 'torch', 'eager_tracing': False, 'explore': True, 'exploration_config': {'type': 'EpsilonGreedy', 'initial_epsilon': 1.0, 'final_epsilon': 0.02, 'epsilon_timesteps': 100000}, 'evaluation_interval': None, 'evaluation_num_episodes': 10, 'evaluation_parallel_to_training': False, 'in_evaluation': False, 'evaluation_config': {'explore': False}, 'evaluation_num_workers': 0, 'custom_eval_function': None, 'sample_async': False, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'compress_observations': False, 'collect_metrics_timeout': 180, 'metrics_smoothing_episodes': 100, 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'min_iter_time_s': 1, 'timesteps_per_iteration': 1000, 'seed': None, 'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, '_fake_gpus': False, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, 'custom_resources_per_worker': {}, 'num_cpus_for_driver': 1, 'placement_strategy': 'PACK', 'input': 'sampler', 'input_evaluation': ['is', 'wis'], 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'multiagent': {'policies': {}, 'policy_mapping_fn': None, 'policies_to_train': None, 'observation_fn': None, 'replay_mode': 'independent', 'count_steps_by': 'env_steps'}, 'logger_config': None, 'simple_optimizer': True, 'monitor': -1, 'num_atoms': 1, 'v_min': -10.0, 'v_max': 10.0, 'noisy': False, 'sigma0': 0.5, 'dueling': False, 'hiddens': [256], 'double_q': True, 'n_step': 1, 'target_network_update_freq': 2500, 'buffer_size': 100000, 'replay_sequence_length': 40, 'prioritized_replay': False, 'prioritized_replay_alpha': 0.6, 'prioritized_replay_beta': 0.4, 'final_prioritized_replay_beta': 0.4, 'prioritized_replay_beta_annealing_timesteps': 20000, 'prioritized_replay_eps': 1e-06, 'before_learn_on_batch': None, 'training_intensity': None, 'lr_schedule': None, 'adam_epsilon': 0.001, 'grad_clip': 40, 'learning_starts': 1000, 'worker_side_prioritization': False, 'zero_init_states': True, 'burn_in': 20, 'use_h_function': True, 'h_function_epsilon': 0.001}, 'time_since_restore': 224.16591882705688, 'timesteps_since_restore': 0, 'iterations_since_restore': 180, 'perf': {'cpu_util_percent': 39.75, 'ram_util_percent': 87.7}}
{'episode_reward_max': 200.0, 'episode_reward_min': 200.0, 'episode_reward_mean': 200.0, 'episode_len_mean': 200.0, 'episode_media': {}, 'episodes_this_iter': 12, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {'default_policy': {}}, 'hist_stats': {'episode_reward': [200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0], 'episode_lengths': [200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 0.0623461045474025, 'mean_inference_ms': 1.018386671366023, 'mean_action_processing_ms': 0.04278194422583663, 'mean_env_wait_ms': 0.0787682835595202, 'mean_env_render_ms': 0.0}, 'off_policy_estimator': {}, 'num_healthy_workers': 3, 'timesteps_total': 334505, 'agent_timesteps_total': 334505, 'timers': {'learn_time_ms': 26.623, 'learn_throughput': 1201.969, 'update_time_ms': 1.703}, 'info': {'learner': {'default_policy': {'allreduce_latency': 0.0, 'grad_gnorm': 0.03724442114095094, 'cur_lr': 0.0005, 'mean_q': 7.333806991577148, 'min_q': -0.9660881757736206, 'max_q': 9.664224624633789, 'mean_td_error': -0.0020993368234485388}}, 'num_steps_sampled': 334505, 'num_agent_steps_sampled': 334505, 'num_steps_trained': 57504, 'num_agent_steps_trained': 2066848, 'last_target_update_ts': 334505, 'num_target_updates': 123}, 'done': False, 'episodes_total': 5436, 'training_iteration': 181, 'experiment_id': '9a9218b428d04beaa5f6f318b39d3761', 'date': '2021-05-12_09-06-10', 'timestamp': 1620824770, 'time_this_iter_s': 1.1047751903533936, 'time_total_s': 225.27069401741028, 'pid': 1496467, 'hostname': 'intercostal', 'node_ip': '192.168.1.216', 'config': {'num_workers': 3, 'num_envs_per_worker': 1, 'create_env_on_driver': False, 'rollout_fragment_length': 4, 'batch_mode': 'complete_episodes', 'train_batch_size': 1280, 'model': {'_use_default_native_models': False, 'fcnet_hiddens': [32], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': True, 'use_lstm': True, 'max_seq_len': 20, 'lstm_cell_size': 64, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'num_framestacks': 'auto', 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1, 'framestack': True}, 'optimizer': {}, 'gamma': 0.997, 'horizon': 200, 'soft_horizon': False, 'no_done_at_end': False, 'env': 'StatelessCartPole', 'env_config': {}, 'render_env': False, 'record_env': False, 'normalize_actions': False, 'clip_rewards': None, 'clip_actions': True, 'preprocessor_pref': 'deepmind', 'lr': 0.0005, 'log_level': 'WARN', 'callbacks': <class '__main__.Callbacks'>, 'ignore_worker_failures': False, 'log_sys_usage': True, 'fake_sampler': False, 'framework': 'torch', 'eager_tracing': False, 'explore': True, 'exploration_config': {'type': 'EpsilonGreedy', 'initial_epsilon': 1.0, 'final_epsilon': 0.02, 'epsilon_timesteps': 100000}, 'evaluation_interval': None, 'evaluation_num_episodes': 10, 'evaluation_parallel_to_training': False, 'in_evaluation': False, 'evaluation_config': {'explore': False}, 'evaluation_num_workers': 0, 'custom_eval_function': None, 'sample_async': False, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'compress_observations': False, 'collect_metrics_timeout': 180, 'metrics_smoothing_episodes': 100, 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'min_iter_time_s': 1, 'timesteps_per_iteration': 1000, 'seed': None, 'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, '_fake_gpus': False, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, 'custom_resources_per_worker': {}, 'num_cpus_for_driver': 1, 'placement_strategy': 'PACK', 'input': 'sampler', 'input_evaluation': ['is', 'wis'], 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'multiagent': {'policies': {}, 'policy_mapping_fn': None, 'policies_to_train': None, 'observation_fn': None, 'replay_mode': 'independent', 'count_steps_by': 'env_steps'}, 'logger_config': None, 'simple_optimizer': True, 'monitor': -1, 'num_atoms': 1, 'v_min': -10.0, 'v_max': 10.0, 'noisy': False, 'sigma0': 0.5, 'dueling': False, 'hiddens': [256], 'double_q': True, 'n_step': 1, 'target_network_update_freq': 2500, 'buffer_size': 100000, 'replay_sequence_length': 40, 'prioritized_replay': False, 'prioritized_replay_alpha': 0.6, 'prioritized_replay_beta': 0.4, 'final_prioritized_replay_beta': 0.4, 'prioritized_replay_beta_annealing_timesteps': 20000, 'prioritized_replay_eps': 1e-06, 'before_learn_on_batch': None, 'training_intensity': None, 'lr_schedule': None, 'adam_epsilon': 0.001, 'grad_clip': 40, 'learning_starts': 1000, 'worker_side_prioritization': False, 'zero_init_states': True, 'burn_in': 20, 'use_h_function': True, 'h_function_epsilon': 0.001}, 'time_since_restore': 225.27069401741028, 'timesteps_since_restore': 0, 'iterations_since_restore': 181, 'perf': {'cpu_util_percent': 38.2, 'ram_util_percent': 87.7}}
{'episode_reward_max': 200.0, 'episode_reward_min': 200.0, 'episode_reward_mean': 200.0, 'episode_len_mean': 200.0, 'episode_media': {}, 'episodes_this_iter': 12, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {'default_policy': {}}, 'hist_stats': {'episode_reward': [200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0], 'episode_lengths': [200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 0.062305249773172235, 'mean_inference_ms': 1.018119758799284, 'mean_action_processing_ms': 0.042772701367368174, 'mean_env_wait_ms': 0.07875249271504577, 'mean_env_render_ms': 0.0}, 'off_policy_estimator': {}, 'num_healthy_workers': 3, 'timesteps_total': 336905, 'agent_timesteps_total': 336905, 'timers': {'learn_time_ms': 26.852, 'learn_throughput': 1191.732, 'update_time_ms': 1.743}, 'info': {'learner': {'default_policy': {'allreduce_latency': 0.0, 'grad_gnorm': 0.2965784887882237, 'cur_lr': 0.0005, 'mean_q': 7.364220142364502, 'min_q': 0.19771356880664825, 'max_q': 9.247050285339355, 'mean_td_error': 0.031875595450401306}}, 'num_steps_sampled': 336905, 'num_agent_steps_sampled': 336905, 'num_steps_trained': 57632, 'num_agent_steps_trained': 2071959, 'last_target_update_ts': 334505, 'num_target_updates': 123}, 'done': False, 'episodes_total': 5448, 'training_iteration': 182, 'experiment_id': '9a9218b428d04beaa5f6f318b39d3761', 'date': '2021-05-12_09-06-12', 'timestamp': 1620824772, 'time_this_iter_s': 1.1147851943969727, 'time_total_s': 226.38547921180725, 'pid': 1496467, 'hostname': 'intercostal', 'node_ip': '192.168.1.216', 'config': {'num_workers': 3, 'num_envs_per_worker': 1, 'create_env_on_driver': False, 'rollout_fragment_length': 4, 'batch_mode': 'complete_episodes', 'train_batch_size': 1280, 'model': {'_use_default_native_models': False, 'fcnet_hiddens': [32], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': True, 'use_lstm': True, 'max_seq_len': 20, 'lstm_cell_size': 64, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'num_framestacks': 'auto', 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1, 'framestack': True}, 'optimizer': {}, 'gamma': 0.997, 'horizon': 200, 'soft_horizon': False, 'no_done_at_end': False, 'env': 'StatelessCartPole', 'env_config': {}, 'render_env': False, 'record_env': False, 'normalize_actions': False, 'clip_rewards': None, 'clip_actions': True, 'preprocessor_pref': 'deepmind', 'lr': 0.0005, 'log_level': 'WARN', 'callbacks': <class '__main__.Callbacks'>, 'ignore_worker_failures': False, 'log_sys_usage': True, 'fake_sampler': False, 'framework': 'torch', 'eager_tracing': False, 'explore': True, 'exploration_config': {'type': 'EpsilonGreedy', 'initial_epsilon': 1.0, 'final_epsilon': 0.02, 'epsilon_timesteps': 100000}, 'evaluation_interval': None, 'evaluation_num_episodes': 10, 'evaluation_parallel_to_training': False, 'in_evaluation': False, 'evaluation_config': {'explore': False}, 'evaluation_num_workers': 0, 'custom_eval_function': None, 'sample_async': False, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'compress_observations': False, 'collect_metrics_timeout': 180, 'metrics_smoothing_episodes': 100, 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'min_iter_time_s': 1, 'timesteps_per_iteration': 1000, 'seed': None, 'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, '_fake_gpus': False, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, 'custom_resources_per_worker': {}, 'num_cpus_for_driver': 1, 'placement_strategy': 'PACK', 'input': 'sampler', 'input_evaluation': ['is', 'wis'], 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'multiagent': {'policies': {}, 'policy_mapping_fn': None, 'policies_to_train': None, 'observation_fn': None, 'replay_mode': 'independent', 'count_steps_by': 'env_steps'}, 'logger_config': None, 'simple_optimizer': True, 'monitor': -1, 'num_atoms': 1, 'v_min': -10.0, 'v_max': 10.0, 'noisy': False, 'sigma0': 0.5, 'dueling': False, 'hiddens': [256], 'double_q': True, 'n_step': 1, 'target_network_update_freq': 2500, 'buffer_size': 100000, 'replay_sequence_length': 40, 'prioritized_replay': False, 'prioritized_replay_alpha': 0.6, 'prioritized_replay_beta': 0.4, 'final_prioritized_replay_beta': 0.4, 'prioritized_replay_beta_annealing_timesteps': 20000, 'prioritized_replay_eps': 1e-06, 'before_learn_on_batch': None, 'training_intensity': None, 'lr_schedule': None, 'adam_epsilon': 0.001, 'grad_clip': 40, 'learning_starts': 1000, 'worker_side_prioritization': False, 'zero_init_states': True, 'burn_in': 20, 'use_h_function': True, 'h_function_epsilon': 0.001}, 'time_since_restore': 226.38547921180725, 'timesteps_since_restore': 0, 'iterations_since_restore': 182, 'perf': {'cpu_util_percent': 39.95, 'ram_util_percent': 87.7}}
{'episode_reward_max': 200.0, 'episode_reward_min': 200.0, 'episode_reward_mean': 200.0, 'episode_len_mean': 200.0, 'episode_media': {}, 'episodes_this_iter': 12, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {'default_policy': {}}, 'hist_stats': {'episode_reward': [200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0, 200.0], 'episode_lengths': [200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 0.062266849022809524, 'mean_inference_ms': 1.0178770418698306, 'mean_action_processing_ms': 0.042764752753851845, 'mean_env_wait_ms': 0.07874011585976008, 'mean_env_render_ms': 0.0}, 'off_policy_estimator': {}, 'num_healthy_workers': 3, 'timesteps_total': 339305, 'agent_timesteps_total': 339305, 'timers': {'learn_time_ms': 26.79, 'learn_throughput': 1194.479, 'update_time_ms': 1.697}, 'info': {'learner': {'default_policy': {'allreduce_latency': 0.0, 'grad_gnorm': 0.05308342865454339, 'cur_lr': 0.0005, 'mean_q': 7.577914714813232, 'min_q': -0.22684314846992493, 'max_q': 9.941761016845703, 'mean_td_error': -0.00033914807136170566}}, 'num_steps_sampled': 339305, 'num_agent_steps_sampled': 339305, 'num_steps_trained': 57760, 'num_agent_steps_trained': 2077079, 'last_target_update_ts': 337505, 'num_target_updates': 124}, 'done': False, 'episodes_total': 5460, 'training_iteration': 183, 'experiment_id': '9a9218b428d04beaa5f6f318b39d3761', 'date': '2021-05-12_09-06-13', 'timestamp': 1620824773, 'time_this_iter_s': 1.1376519203186035, 'time_total_s': 227.52313113212585, 'pid': 1496467, 'hostname': 'intercostal', 'node_ip': '192.168.1.216', 'config': {'num_workers': 3, 'num_envs_per_worker': 1, 'create_env_on_driver': False, 'rollout_fragment_length': 4, 'batch_mode': 'complete_episodes', 'train_batch_size': 1280, 'model': {'_use_default_native_models': False, 'fcnet_hiddens': [32], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': True, 'use_lstm': True, 'max_seq_len': 20, 'lstm_cell_size': 64, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'num_framestacks': 'auto', 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1, 'framestack': True}, 'optimizer': {}, 'gamma': 0.997, 'horizon': 200, 'soft_horizon': False, 'no_done_at_end': False, 'env': 'StatelessCartPole', 'env_config': {}, 'render_env': False, 'record_env': False, 'normalize_actions': False, 'clip_rewards': None, 'clip_actions': True, 'preprocessor_pref': 'deepmind', 'lr': 0.0005, 'log_level': 'WARN', 'callbacks': <class '__main__.Callbacks'>, 'ignore_worker_failures': False, 'log_sys_usage': True, 'fake_sampler': False, 'framework': 'torch', 'eager_tracing': False, 'explore': True, 'exploration_config': {'type': 'EpsilonGreedy', 'initial_epsilon': 1.0, 'final_epsilon': 0.02, 'epsilon_timesteps': 100000}, 'evaluation_interval': None, 'evaluation_num_episodes': 10, 'evaluation_parallel_to_training': False, 'in_evaluation': False, 'evaluation_config': {'explore': False}, 'evaluation_num_workers': 0, 'custom_eval_function': None, 'sample_async': False, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'compress_observations': False, 'collect_metrics_timeout': 180, 'metrics_smoothing_episodes': 100, 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'min_iter_time_s': 1, 'timesteps_per_iteration': 1000, 'seed': None, 'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, '_fake_gpus': False, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, 'custom_resources_per_worker': {}, 'num_cpus_for_driver': 1, 'placement_strategy': 'PACK', 'input': 'sampler', 'input_evaluation': ['is', 'wis'], 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'multiagent': {'policies': {}, 'policy_mapping_fn': None, 'policies_to_train': None, 'observation_fn': None, 'replay_mode': 'independent', 'count_steps_by': 'env_steps'}, 'logger_config': None, 'simple_optimizer': True, 'monitor': -1, 'num_atoms': 1, 'v_min': -10.0, 'v_max': 10.0, 'noisy': False, 'sigma0': 0.5, 'dueling': False, 'hiddens': [256], 'double_q': True, 'n_step': 1, 'target_network_update_freq': 2500, 'buffer_size': 100000, 'replay_sequence_length': 40, 'prioritized_replay': False, 'prioritized_replay_alpha': 0.6, 'prioritized_replay_beta': 0.4, 'final_prioritized_replay_beta': 0.4, 'prioritized_replay_beta_annealing_timesteps': 20000, 'prioritized_replay_eps': 1e-06, 'before_learn_on_batch': None, 'training_intensity': None, 'lr_schedule': None, 'adam_epsilon': 0.001, 'grad_clip': 40, 'learning_starts': 1000, 'worker_side_prioritization': False, 'zero_init_states': True, 'burn_in': 20, 'use_h_function': True, 'h_function_epsilon': 0.001}, 'time_since_restore': 227.52313113212585, 'timesteps_since_restore': 0, 'iterations_since_restore': 183, 'perf': {'cpu_util_percent': 39.7, 'ram_util_percent': 87.7}}
Without the patch it took 200 iterations. Those were only 1 run each so not clear if they are actually different or not.
Hello, mvindiola1! I find you have use the "use_lstm" and "fcnet_hiddens" at the same time. I guess that the agent nerual network is two fc network, then lstm, right?
@zzchuman Correct
What is the problem?
Ray version and other system information (Python version, TensorFlow version, OS): ray: master
A timeslice's burn_in time steps will have samples from other episodes for as long as the cumulative sum of the previous episode lengths are less than the burn_in size.
Example: In cartpole/stateless cartpole if we have the following sequence lengths from a parallel rollout:
seq_lens: [13 18 20 5]
then we will have the following timeslice indexes:[(-20, 13), (-7, 31), (11, 51), (31, 56)]
.The first timeslice will have only 2 eps_id (the real eps_id plus the dummy 0 eps_id for the burn_in time). The second timeslice will have 3 eps_ids,
[0*7, eps_id_ts_0 * 13, eps_id_ts_1*18, 0*2]
.I think the burn in should not cross episode boundaries and this is a bug but if it can then the current behavior is OK. If it is a bug then it is coming from the code below. The logic works for the first timestep but not for subsequent ones if we have not yet hit the burn_in size. https://github.com/ray-project/ray/blob/f93f096fc92c67011477790650783d2db04ff705/rllib/policy/rnn_sequencing.py#L374-L377
This should fix the problem. The last hunk is just to find instances of the issue.
Running the script below with the assertion above added to rnn_sequencing.py resuts in the following error.
Reproduction (REQUIRED)