kengz / SLM-Lab

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
https://slm-lab.gitbook.io/slm-lab/
MIT License
1.23k stars 263 forks source link

Empty Multi trial Graph in 'search' mode #484

Closed mania087 closed 2 years ago

mania087 commented 3 years ago

Describe the bug Hello, After following Quick-start guide and run SARSA examples on cartpole, everything's works from terminal output to session graph. except the multi-trial graph in 'search' mode is empty but there is no error that stand out in the log. I tried re-installing , upgrading and downgrading plotly-orca, but the multi trial graph still empty.

Sorry the log is too long i can't post it all

command entered :
python run_lab.py slm_lab/spec/benchmark/sarsa/sarsa_cartpole.json sarsa_epsilon_greedy_cartpole search

To Reproduce

  1. OS and environment: Ubuntu 18.04 LTS
  2. SLM Lab git SHA (run git rev-parse HEAD to get it): dda02d00031553aeda4c49c5baa7d0706c53996b
  3. spec file used: slm_lab/spec/benchmark/sarsa/sarsa_cartpole.json

Additional context image image

Error logs

[2021-05-04 16:51:39,536 PID:3860 INFO run_lab.py get_spec_and_run] Running lab spec_file:assignment_2/code_3_6.json spec_name:sarsa_epsilon_greedy_cartpole in mode:search
[2021-05-04 16:51:39,546 PID:3860 INFO search.py run_ray_search] Running ray search for spec sarsa_epsilon_greedy_cartpole
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/2 GPUs
Memory usage on this node: 3.8/33.6 GB

== Status ==
Using FIFO scheduling algorithm.
Resources requested: 4/16 CPUs, 0/2 GPUs
Memory usage on this node: 3.8/33.6 GB
Result logdir: /home/iwan/ray_results/sarsa_epsilon_greedy_cartpole
Number of trials: 7 ({'RUNNING': 1, 'PENDING': 6})
PENDING trials:
 - ray_trainable_1_agent.0.net.optim_spec.lr=0.001,trial_index=1:   PENDING
 - ray_trainable_2_agent.0.net.optim_spec.lr=0.001,trial_index=2:   PENDING
 - ray_trainable_3_agent.0.net.optim_spec.lr=0.005,trial_index=3:   PENDING
 - ray_trainable_4_agent.0.net.optim_spec.lr=0.01,trial_index=4:    PENDING
 - ray_trainable_5_agent.0.net.optim_spec.lr=0.05,trial_index=5:    PENDING
 - ray_trainable_6_agent.0.net.optim_spec.lr=0.1,trial_index=6: PENDING
RUNNING trials:
 - ray_trainable_0_agent.0.net.optim_spec.lr=0.0005,trial_index=0:  RUNNING

(pid=3914) [2021-05-04 16:51:41,303 PID:3914 INFO logger.py info] Running sessions
(pid=3913) [2021-05-04 16:51:41,303 PID:3913 INFO logger.py info] Running sessions
(pid=3926) [2021-05-04 16:51:41,303 PID:3926 INFO logger.py info] Running sessions
(pid=3925) [2021-05-04 16:51:41,286 PID:3925 INFO logger.py info] Running sessions
(pid=3914) [2021-05-04 16:51:41,344 PID:4083 INFO openai.py __init__] OpenAIEnv:
(pid=3914) - env_spec = {'max_frame': 100000, 'max_t': None, 'name': 'CartPole-v0'}
(pid=3914) - eval_frequency = 2000
(pid=3914) - log_frequency = 10000
(pid=3914) - frame_op = None
(pid=3914) - frame_op_len = None
(pid=3914) - image_downsize = (84, 84)
(pid=3914) - normalize_state = False
(pid=3914) - reward_scale = None
(pid=3914) - num_envs = 1
(pid=3914) - name = CartPole-v0
(pid=3914) - max_t = 200
(pid=3914) - max_frame = 100000
(pid=3914) - to_render = False
(pid=3914) - is_venv = False
(pid=3914) - clock_speed = 1
(pid=3914) - clock = <slm_lab.env.base.Clock object at 0x7fe01ea91ba8>
(pid=3914) - done = False
(pid=3914) - total_reward = nan
(pid=3914) - u_env = <TrackReward<TimeLimit<CartPoleEnv<CartPole-v0>>>>
(pid=3914) - observation_space = Box(4,)
(pid=3914) - action_space = Discrete(2)
(pid=3914) - observable_dim = {'state': 4}
(pid=3914) - action_dim = 2
(pid=3914) - is_discrete = True
(pid=3914) [2021-05-04 16:51:41,351 PID:4079 INFO openai.py __init__] OpenAIEnv:
(pid=3914) - env_spec = {'max_frame': 100000, 'max_t': None, 'name': 'CartPole-v0'}
(pid=3914) - eval_frequency = 2000
(pid=3914) - log_frequency = 10000
(pid=3914) - frame_op = None
(pid=3914) - frame_op_len = None
(pid=3914) - image_downsize = (84, 84)
(pid=3914) - normalize_state = False
(pid=3914) - reward_scale = None
(pid=3914) - num_envs = 1
(pid=3914) - name = CartPole-v0
(pid=3914) - max_t = 200
(pid=3914) - max_frame = 100000
(pid=3914) - to_render = False
(pid=3914) - is_venv = False
(pid=3914) - clock_speed = 1
(pid=3914) - clock = <slm_lab.env.base.Clock object at 0x7fe284ef5cf8>
(pid=3914) - done = False
(pid=3914) - total_reward = nan
(pid=3914) - u_env = <TrackReward<TimeLimit<CartPoleEnv<CartPole-v0>>>>
(pid=3914) - observation_space = Box(4,)
(pid=3914) - action_space = Discrete(2)
(pid=3914) - observable_dim = {'state': 4}
(pid=3914) - action_dim = 2
(pid=3914) - is_discrete = True
(pid=3927) [2021-05-04 16:53:24,046 PID:6133 INFO logger.py info] Session:
(pid=3927) - spec = {'cuda_offset': 0,
(pid=3927)  'distributed': False,
(pid=3927)  'eval_frequency': 2000,
(pid=3927)  'experiment': 0,
(pid=3927)  'experiment_ts': '2021_05_04_165139',
(pid=3927)  'git_sha': 'dda02d00031553aeda4c49c5baa7d0706c53996b',
(pid=3927)  'graph_prepath': 'data/sarsa_epsilon_greedy_cartpole_2021_05_04_165139/graph/sarsa_epsilon_greedy_cartpole_t6_s0',
(pid=3927)  'info_prepath': 'data/sarsa_epsilon_greedy_cartpole_2021_05_04_165139/info/sarsa_epsilon_greedy_cartpole_t6_s0',
(pid=3927)  'log_prepath': 'data/sarsa_epsilon_greedy_cartpole_2021_05_04_165139/log/sarsa_epsilon_greedy_cartpole_t6_s0',
(pid=3927)  'max_session': 4,
(pid=3927)  'max_trial': 1,
(pid=3927)  'model_prepath': 'data/sarsa_epsilon_greedy_cartpole_2021_05_04_165139/model/sarsa_epsilon_greedy_cartpole_t6_s0',
(pid=3927)  'prepath': 'data/sarsa_epsilon_greedy_cartpole_2021_05_04_165139/sarsa_epsilon_greedy_cartpole_t6_s0',
(pid=3927)  'random_seed': 1620714804,
(pid=3927)  'resume': False,
(pid=3927)  'rigorous_eval': 0,
(pid=3927)  'session': 0,
(pid=3927)  'trial': 6}
(pid=3927) - index = 0
(pid=3927) - agent = <slm_lab.agent.Agent object at 0x7f9cecb11160>
(pid=3927) - env = <slm_lab.env.openai.OpenAIEnv object at 0x7f9cecba8668>
(pid=3927) - eval_env = <slm_lab.env.openai.OpenAIEnv object at 0x7f9cecba8668>
(pid=3927) [2021-05-04 16:53:24,046 PID:6133 INFO logger.py info] Running RL loop for trial 6 session 0
(pid=3927) [2021-05-04 16:53:24,053 PID:6136 INFO base.py end_init_nets] Initialized algorithm models for lab_mode: search
(pid=3927) [2021-05-04 16:53:24,054 PID:6133 INFO __init__.py log_summary] Trial 6 session 0 sarsa_epsilon_greedy_cartpole_t6_s0 [train_df] epi: 0  t: 0  wall_t: 0  opt_step: 0  frame: 0  fps: 0  total_reward: nan  total_reward_ma: nan  loss: nan  lr: 0.1  explore_var: 1  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:53:24,057 PID:6136 INFO base.py __init__] SARSA:
(pid=3927) - agent = <slm_lab.agent.Agent object at 0x7f9cecb134a8>
(pid=3927) - action_pdtype = Argmax
(pid=3927) - action_policy = <function epsilon_greedy at 0x7f9cf98f8400>
(pid=3927) - explore_var_spec = {'end_step': 10000,
(pid=3927)  'end_val': 0.05,
(pid=3927)  'name': 'linear_decay',
(pid=3927)  'start_step': 0,
(pid=3927)  'start_val': 1.0}
(pid=3927) - gamma = 0.99
(pid=3927) - training_frequency = 5
(pid=3927) - to_train = 0
(pid=3927) - explore_var_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7f9f52f49438>
(pid=3927) - net = MLPNet(
(pid=3927)   (model): Sequential(
(pid=3927)     (0): Linear(in_features=4, out_features=64, bias=True)
(pid=3927)     (1): SELU()
(pid=3927)   )
(pid=3927)   (model_tail): Sequential(
(pid=3927)     (0): Linear(in_features=64, out_features=2, bias=True)
(pid=3927)   )
(pid=3927)   (loss_fn): MSELoss()
(pid=3927) )
(pid=3927) - net_names = ['net']
(pid=3927) - optim = RMSprop (
(pid=3927) Parameter Group 0
(pid=3927)     alpha: 0.99
(pid=3927)     centered: False
(pid=3927)     eps: 1e-08
(pid=3927)     lr: 0.1
(pid=3927)     momentum: 0
(pid=3927)     weight_decay: 0
(pid=3927) )
(pid=3927) - lr_scheduler = <slm_lab.agent.net.net_util.NoOpLRScheduler object at 0x7f9cecb2ca90>
(pid=3927) - global_net = None
(pid=3927) [2021-05-04 16:53:24,059 PID:6136 INFO __init__.py __init__] Agent:
(pid=3927) - spec = {'cuda_offset': 0,
(pid=3927)  'distributed': False,
(pid=3927)  'eval_frequency': 2000,
(pid=3927)  'experiment': 0,
(pid=3927)  'experiment_ts': '2021_05_04_165139',
(pid=3927)  'git_sha': 'dda02d00031553aeda4c49c5baa7d0706c53996b',
(pid=3927)  'graph_prepath': 'data/sarsa_epsilon_greedy_cartpole_2021_05_04_165139/graph/sarsa_epsilon_greedy_cartpole_t6_s3',
(pid=3927)  'info_prepath': 'data/sarsa_epsilon_greedy_cartpole_2021_05_04_165139/info/sarsa_epsilon_greedy_cartpole_t6_s3',
(pid=3927)  'log_prepath': 'data/sarsa_epsilon_greedy_cartpole_2021_05_04_165139/log/sarsa_epsilon_greedy_cartpole_t6_s3',
(pid=3927)  'max_session': 4,
(pid=3927)  'max_trial': 1,
(pid=3927)  'model_prepath': 'data/sarsa_epsilon_greedy_cartpole_2021_05_04_165139/model/sarsa_epsilon_greedy_cartpole_t6_s3',
(pid=3927)  'prepath': 'data/sarsa_epsilon_greedy_cartpole_2021_05_04_165139/sarsa_epsilon_greedy_cartpole_t6_s3',
(pid=3927)  'random_seed': 1620717804,
(pid=3927)  'resume': False,
(pid=3927)  'rigorous_eval': 0,
(pid=3927)  'session': 3,
(pid=3927)  'trial': 6}
(pid=3927) - agent_spec = {'algorithm': {'action_pdtype': 'Argmax',
(pid=3927)                'action_policy': 'epsilon_greedy',
(pid=3927)                'explore_var_spec': {'end_step': 10000,
(pid=3927)                                     'end_val': 0.05,
(pid=3927)                                     'name': 'linear_decay',
(pid=3927)                                     'start_step': 0,
(pid=3927)                                     'start_val': 1.0},
(pid=3927)                'gamma': 0.99,
(pid=3927)                'name': 'SARSA',
(pid=3927)                'training_frequency': 5},
(pid=3927)  'memory': {'name': 'OnPolicyBatchReplay'},
(pid=3927)  'name': 'SARSA',
(pid=3927)  'net': {'clip_grad_val': 0.5,
(pid=3927)          'hid_layers': [64],
(pid=3927)          'hid_layers_activation': 'selu',
(pid=3927)          'loss_spec': {'name': 'MSELoss'},
(pid=3927)          'lr_scheduler_spec': None,
(pid=3927)          'optim_spec': {'lr': 0.1, 'name': 'RMSprop'},
(pid=3927)          'type': 'MLPNet'}}
(pid=3927) - name = SARSA
(pid=3927) - body = body: {
(pid=3927)   "agent": "<slm_lab.agent.Agent object at 0x7f9cecb134a8>",
(pid=3927)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7f9cecba8978>",
(pid=3927)   "a": 0,
(pid=3927)   "e": 0,
(pid=3927)   "b": 0,
(pid=3927)   "aeb": "(0, 0, 0)",
(pid=3927)   "explore_var": 1.0,
(pid=3927)   "entropy_coef": NaN,
(pid=3927)   "loss": NaN,
(pid=3927)   "mean_entropy": NaN,
(pid=3927)   "mean_grad_norm": NaN,
(pid=3927)   "best_total_reward_ma": -Infinity,
(pid=3927)   "total_reward_ma": NaN,
(pid=3927)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=3927)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=3927)   "observation_space": "Box(4,)",
(pid=3927)   "action_space": "Discrete(2)",
(pid=3927)   "observable_dim": {
(pid=3927)     "state": 4
(pid=3927)   },
(pid=3927)   "state_dim": 4,
(pid=3927)   "action_dim": 2,
(pid=3927)   "is_discrete": true,
(pid=3927)   "action_type": "discrete",
(pid=3927)   "action_pdtype": "Argmax",
(pid=3927)   "ActionPD": "<class 'slm_lab.lib.distribution.Argmax'>",
(pid=3927)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyBatchReplay object at 0x7f9cecb13630>"
(pid=3927) }
(pid=3927) - algorithm = <slm_lab.agent.algorithm.sarsa.SARSA object at 0x7f9cecb2c7b8>
(pid=3927) [2021-05-04 16:53:24,060 PID:6136 INFO logger.py info] Session:
(pid=3927) - spec = {'cuda_offset': 0,
(pid=3927)  'distributed': False,
(pid=3927)  'eval_frequency': 2000,
(pid=3927)  'experiment': 0,
(pid=3927)  'experiment_ts': '2021_05_04_165139',
(pid=3927)  'git_sha': 'dda02d00031553aeda4c49c5baa7d0706c53996b',
(pid=3927)  'graph_prepath': 'data/sarsa_epsilon_greedy_cartpole_2021_05_04_165139/graph/sarsa_epsilon_greedy_cartpole_t6_s3',
(pid=3927)  'info_prepath': 'data/sarsa_epsilon_greedy_cartpole_2021_05_04_165139/info/sarsa_epsilon_greedy_cartpole_t6_s3',
(pid=3927)  'log_prepath': 'data/sarsa_epsilon_greedy_cartpole_2021_05_04_165139/log/sarsa_epsilon_greedy_cartpole_t6_s3',
(pid=3927)  'max_session': 4,
(pid=3927)  'max_trial': 1,
(pid=3927)  'model_prepath': 'data/sarsa_epsilon_greedy_cartpole_2021_05_04_165139/model/sarsa_epsilon_greedy_cartpole_t6_s3',
(pid=3927)  'prepath': 'data/sarsa_epsilon_greedy_cartpole_2021_05_04_165139/sarsa_epsilon_greedy_cartpole_t6_s3',
(pid=3927)  'random_seed': 1620717804,
(pid=3927)  'resume': False,
(pid=3927)  'rigorous_eval': 0,
(pid=3927)  'session': 3,
(pid=3927)  'trial': 6}
(pid=3927) - index = 3
(pid=3927) - agent = <slm_lab.agent.Agent object at 0x7f9cecb134a8>
(pid=3927) - env = <slm_lab.env.openai.OpenAIEnv object at 0x7f9cecba8978>
(pid=3927) - eval_env = <slm_lab.env.openai.OpenAIEnv object at 0x7f9cecba8978>
(pid=3927) [2021-05-04 16:53:24,060 PID:6136 INFO logger.py info] Running RL loop for trial 6 session 3
(pid=3927) [2021-05-04 16:53:24,067 PID:6136 INFO __init__.py log_summary] Trial 6 session 3 sarsa_epsilon_greedy_cartpole_t6_s3 [train_df] epi: 0  t: 0  wall_t: 0  opt_step: 0  frame: 0  fps: 0  total_reward: nan  total_reward_ma: nan  loss: nan  lr: 0.1  explore_var: 1  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:53:28,061 PID:6088 INFO __init__.py log_summary] Trial 5 session 2 sarsa_epsilon_greedy_cartpole_t5_s2 [train_df] epi: 240  t: 52  wall_t: 4  opt_step: 12000  frame: 10000  fps: 2500  total_reward: 97  total_reward_ma: 97  loss: 17.4592  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:53:28,120 PID:6064 INFO __init__.py log_summary] Trial 4 session 0 sarsa_epsilon_greedy_cartpole_t4_s0 [train_df] epi: 427  t: 7  wall_t: 4  opt_step: 12000  frame: 10000  fps: 2500  total_reward: 67  total_reward_ma: 67  loss: 0.383154  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:53:28,620 PID:6133 INFO __init__.py log_summary] Trial 6 session 0 sarsa_epsilon_greedy_cartpole_t6_s0 [train_df] epi: 292  t: 6  wall_t: 4  opt_step: 12000  frame: 10000  fps: 2500  total_reward: 10  total_reward_ma: 10  loss: 117.837  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:53:28,679 PID:6135 INFO __init__.py log_summary] Trial 6 session 2 sarsa_epsilon_greedy_cartpole_t6_s2 [train_df] epi: 277  t: 84  wall_t: 4  opt_step: 12000  frame: 10000  fps: 2500  total_reward: 20  total_reward_ma: 20  loss: 61.2264  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:53:30,219 PID:6084 INFO __init__.py log_summary] Trial 5 session 1 sarsa_epsilon_greedy_cartpole_t5_s1 [train_df] epi: 224  t: 14  wall_t: 6  opt_step: 12000  frame: 10000  fps: 1666.67  total_reward: 200  total_reward_ma: 200  loss: 4.94699  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:53:30,304 PID:6083 INFO __init__.py log_summary] Trial 5 session 0 sarsa_epsilon_greedy_cartpole_t5_s0 [train_df] epi: 253  t: 11  wall_t: 6  opt_step: 12000  frame: 10000  fps: 1666.67  total_reward: 11  total_reward_ma: 11  loss: 37.0297  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:53:30,310 PID:6083 INFO __init__.py log_summary] Trial 5 session 0 sarsa_epsilon_greedy_cartpole_t5_s0 [train_df] epi: 253  t: 11  wall_t: 6  opt_step: 12000  frame: 10000  fps: 1666.67  total_reward: 11  total_reward_ma: 11  loss: 37.0297  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:53:30,457 PID:6089 INFO __init__.py log_summary] Trial 5 session 3 sarsa_epsilon_greedy_cartpole_t5_s3 [train_df] epi: 226  t: 170  wall_t: 7  opt_step: 12000  frame: 10000  fps: 1428.57  total_reward: 140  total_reward_ma: 140  loss: 0.199613  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:53:30,476 PID:6065 INFO __init__.py log_summary] Trial 4 session 1 sarsa_epsilon_greedy_cartpole_t4_s1 [train_df] epi: 330  t: 160  wall_t: 7  opt_step: 12000  frame: 10000  fps: 1428.57  total_reward: 185  total_reward_ma: 185  loss: 0.56581  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:53:30,566 PID:6066 INFO __init__.py log_summary] Trial 4 session 2 sarsa_epsilon_greedy_cartpole_t4_s2 [train_df] epi: 431  t: 9  wall_t: 7  opt_step: 12000  frame: 10000  fps: 1428.57  total_reward: 14  total_reward_ma: 14  loss: 0.321296  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:53:30,591 PID:6067 INFO __init__.py log_summary] Trial 4 session 3 sarsa_epsilon_greedy_cartpole_t4_s3 [train_df] epi: 394  t: 13  wall_t: 7  opt_step: 12000  frame: 10000  fps: 1428.57  total_reward: 27  total_reward_ma: 27  loss: 1.83942  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:53:31,291 PID:6136 INFO __init__.py log_summary] Trial 6 session 3 sarsa_epsilon_greedy_cartpole_t6_s3 [train_df] epi: 367  t: 10  wall_t: 7  opt_step: 12000  frame: 10000  fps: 1428.57  total_reward: 10  total_reward_ma: 10  loss: 37.3157  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:53:31,297 PID:6136 INFO __init__.py log_summary] Trial 6 session 3 sarsa_epsilon_greedy_cartpole_t6_s3 [train_df] epi: 367  t: 10  wall_t: 7  opt_step: 12000  frame: 10000  fps: 1428.57  total_reward: 10  total_reward_ma: 10  loss: 37.3157  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:53:31,610 PID:6134 INFO __init__.py log_summary] Trial 6 session 1 sarsa_epsilon_greedy_cartpole_t6_s1 [train_df] epi: 298  t: 6  wall_t: 7  opt_step: 12000  frame: 10000  fps: 1428.57  total_reward: 19  total_reward_ma: 19  loss: 2.98665  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:53:33,588 PID:6088 INFO __init__.py log_summary] Trial 5 session 2 sarsa_epsilon_greedy_cartpole_t5_s2 [train_df] epi: 317  t: 39  wall_t: 10  opt_step: 24000  frame: 20000  fps: 2000  total_reward: 13  total_reward_ma: 55  loss: 5.29746  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:53:33,597 PID:6088 INFO __init__.py log_metrics] Trial 5 session 2 sarsa_epsilon_greedy_cartpole_t5_s2 [train_df metrics] final_return_ma: 55  strength: 33.14  max_strength: 75.14  final_strength: -8.86  sample_efficiency: 0.000106684  training_efficiency: 8.89031e-05  stability: -0.117913
(pid=3919) [2021-05-04 16:53:33,729 PID:6064 INFO __init__.py log_summary] Trial 4 session 0 sarsa_epsilon_greedy_cartpole_t4_s0 [train_df] epi: 741  t: 43  wall_t: 10  opt_step: 24000  frame: 20000  fps: 2000  total_reward: 72  total_reward_ma: 69.5  loss: 0.859354  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:53:33,739 PID:6064 INFO __init__.py log_metrics] Trial 4 session 0 sarsa_epsilon_greedy_cartpole_t4_s0 [train_df metrics] final_return_ma: 69.5  strength: 47.64  max_strength: 50.14  final_strength: 50.14  sample_efficiency: 7.36881e-05  training_efficiency: 6.14067e-05  stability: 1
(pid=3927) [2021-05-04 16:53:34,169 PID:6133 INFO __init__.py log_summary] Trial 6 session 0 sarsa_epsilon_greedy_cartpole_t6_s0 [train_df] epi: 1045  t: 3  wall_t: 10  opt_step: 24000  frame: 20000  fps: 2000  total_reward: 15  total_reward_ma: 12.5  loss: 10.5568  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:53:34,178 PID:6133 INFO __init__.py log_metrics] Trial 6 session 0 sarsa_epsilon_greedy_cartpole_t6_s0 [train_df metrics] final_return_ma: 12.5  strength: -9.36  max_strength: -6.86  final_strength: -6.86  sample_efficiency: 8.16774e-05  training_efficiency: 6.80645e-05  stability: 1
(pid=3927) [2021-05-04 16:53:34,261 PID:6135 INFO __init__.py log_summary] Trial 6 session 2 sarsa_epsilon_greedy_cartpole_t6_s2 [train_df] epi: 1044  t: 28  wall_t: 10  opt_step: 24000  frame: 20000  fps: 2000  total_reward: 26  total_reward_ma: 23  loss: 10.2876  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:53:34,271 PID:6135 INFO __init__.py log_metrics] Trial 6 session 2 sarsa_epsilon_greedy_cartpole_t6_s2 [train_df metrics] final_return_ma: 23  strength: 1.14  max_strength: 4.14  final_strength: 4.14  sample_efficiency: 9.21049e-06  training_efficiency: 7.67541e-06  stability: 1
(pid=3919) [2021-05-04 16:53:37,573 PID:6065 INFO __init__.py log_summary] Trial 4 session 1 sarsa_epsilon_greedy_cartpole_t4_s1 [train_df] epi: 835  t: 4  wall_t: 14  opt_step: 24000  frame: 20000  fps: 1428.57  total_reward: 13  total_reward_ma: 99  loss: 17498.1  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:53:37,586 PID:6065 INFO __init__.py log_metrics] Trial 4 session 1 sarsa_epsilon_greedy_cartpole_t4_s1 [train_df metrics] final_return_ma: 99  strength: 77.14  max_strength: 163.14  final_strength: -8.86  sample_efficiency: 0.000102871  training_efficiency: 8.57262e-05  stability: -0.0543091
(pid=3917) [2021-05-04 16:53:38,632 PID:6084 INFO __init__.py log_summary] Trial 5 session 1 sarsa_epsilon_greedy_cartpole_t5_s1 [train_df] epi: 298  t: 81  wall_t: 15  opt_step: 24000  frame: 20000  fps: 1333.33  total_reward: 66  total_reward_ma: 133  loss: 12.1518  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:53:38,645 PID:6084 INFO __init__.py log_metrics] Trial 5 session 1 sarsa_epsilon_greedy_cartpole_t5_s1 [train_df metrics] final_return_ma: 133  strength: 111.14  max_strength: 178.14  final_strength: 44.14  sample_efficiency: 9.00711e-05  training_efficiency: 7.50592e-05  stability: 0.247783
(pid=3917) [2021-05-04 16:53:38,772 PID:6083 INFO __init__.py log_summary] Trial 5 session 0 sarsa_epsilon_greedy_cartpole_t5_s0 [train_df] epi: 688  t: 15  wall_t: 15  opt_step: 24000  frame: 20000  fps: 1333.33  total_reward: 8  total_reward_ma: 9.5  loss: 1.01667  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:53:38,785 PID:6083 INFO __init__.py log_metrics] Trial 5 session 0 sarsa_epsilon_greedy_cartpole_t5_s0 [train_df metrics] final_return_ma: 9.5  strength: -12.36  max_strength: -10.86  final_strength: -13.86  sample_efficiency: 7.1966e-05  training_efficiency: 5.99717e-05  stability: 0.723757
(pid=3917) [2021-05-04 16:53:39,092 PID:6089 INFO __init__.py log_summary] Trial 5 session 3 sarsa_epsilon_greedy_cartpole_t5_s3 [train_df] epi: 296  t: 80  wall_t: 15  opt_step: 24000  frame: 20000  fps: 1333.33  total_reward: 200  total_reward_ma: 170  loss: 0.940201  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:53:39,106 PID:6089 INFO __init__.py log_metrics] Trial 5 session 3 sarsa_epsilon_greedy_cartpole_t5_s3 [train_df metrics] final_return_ma: 170  strength: 148.14  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 6.99372e-05  training_efficiency: 5.8281e-05  stability: 1
(pid=3919) [2021-05-04 16:53:39,214 PID:6066 INFO __init__.py log_summary] Trial 4 session 2 sarsa_epsilon_greedy_cartpole_t4_s2 [train_df] epi: 875  t: 49  wall_t: 15  opt_step: 24000  frame: 20000  fps: 1333.33  total_reward: 11  total_reward_ma: 12.5  loss: 0.807788  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:53:39,228 PID:6066 INFO __init__.py log_metrics] Trial 4 session 2 sarsa_epsilon_greedy_cartpole_t4_s2 [train_df metrics] final_return_ma: 12.5  strength: -9.36  max_strength: -7.86  final_strength: -10.86  sample_efficiency: 7.09936e-05  training_efficiency: 5.91613e-05  stability: 0.618321
(pid=3919) [2021-05-04 16:53:39,239 PID:6067 INFO __init__.py log_summary] Trial 4 session 3 sarsa_epsilon_greedy_cartpole_t4_s3 [train_df] epi: 804  t: 18  wall_t: 15  opt_step: 24000  frame: 20000  fps: 1333.33  total_reward: 30  total_reward_ma: 28.5  loss: 3.89151  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:53:39,254 PID:6067 INFO __init__.py log_metrics] Trial 4 session 3 sarsa_epsilon_greedy_cartpole_t4_s3 [train_df metrics] final_return_ma: 28.5  strength: 6.64  max_strength: 8.14  final_strength: 8.14  sample_efficiency: 6.93524e-05  training_efficiency: 5.77937e-05  stability: 1
(pid=3919) [2021-05-04 16:53:39,423 PID:6064 INFO __init__.py log_summary] Trial 4 session 0 sarsa_epsilon_greedy_cartpole_t4_s0 [train_df] epi: 857  t: 139  wall_t: 16  opt_step: 36000  frame: 30000  fps: 1875  total_reward: 200  total_reward_ma: 113  loss: 0.0415456  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:53:39,431 PID:6064 INFO __init__.py log_metrics] Trial 4 session 0 sarsa_epsilon_greedy_cartpole_t4_s0 [train_df metrics] final_return_ma: 113  strength: 91.14  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 4.73959e-05  training_efficiency: 3.94966e-05  stability: 1
(pid=3927) [2021-05-04 16:53:39,782 PID:6133 INFO __init__.py log_summary] Trial 6 session 0 sarsa_epsilon_greedy_cartpole_t6_s0 [train_df] epi: 1586  t: 7  wall_t: 15  opt_step: 36000  frame: 30000  fps: 2000  total_reward: 42  total_reward_ma: 22.3333  loss: 0.867192  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:53:39,790 PID:6133 INFO __init__.py log_metrics] Trial 6 session 0 sarsa_epsilon_greedy_cartpole_t6_s0 [train_df metrics] final_return_ma: 22.3333  strength: 0.473333  max_strength: 20.14  final_strength: 20.14  sample_efficiency: -0.000603992  training_efficiency: -0.000503326  stability: 1
(pid=3927) [2021-05-04 16:53:39,961 PID:6136 INFO __init__.py log_summary] Trial 6 session 3 sarsa_epsilon_greedy_cartpole_t6_s3 [train_df] epi: 1271  t: 25  wall_t: 15  opt_step: 24000  frame: 20000  fps: 1333.33  total_reward: 11  total_reward_ma: 10.5  loss: 7.17299  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:53:39,968 PID:6135 INFO __init__.py log_summary] Trial 6 session 2 sarsa_epsilon_greedy_cartpole_t6_s2 [train_df] epi: 1738  t: 4  wall_t: 15  opt_step: 36000  frame: 30000  fps: 2000  total_reward: 46  total_reward_ma: 30.6667  loss: 1.031  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:53:39,975 PID:6136 INFO __init__.py log_metrics] Trial 6 session 3 sarsa_epsilon_greedy_cartpole_t6_s3 [train_df metrics] final_return_ma: 10.5  strength: -11.36  max_strength: -10.86  final_strength: -10.86  sample_efficiency: 7.61004e-05  training_efficiency: 6.3417e-05  stability: 1
(pid=3927) [2021-05-04 16:53:39,979 PID:6135 INFO __init__.py log_metrics] Trial 6 session 2 sarsa_epsilon_greedy_cartpole_t6_s2 [train_df metrics] final_return_ma: 30.6667  strength: 8.80667  max_strength: 24.14  final_strength: 24.14  sample_efficiency: 3.12516e-05  training_efficiency: 2.6043e-05  stability: 1
(pid=3927) [2021-05-04 16:53:40,414 PID:6134 INFO __init__.py log_summary] Trial 6 session 1 sarsa_epsilon_greedy_cartpole_t6_s1 [train_df] epi: 1003  t: 1  wall_t: 16  opt_step: 24000  frame: 20000  fps: 1250  total_reward: 11  total_reward_ma: 15  loss: 107.965  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:53:40,428 PID:6134 INFO __init__.py log_metrics] Trial 6 session 1 sarsa_epsilon_greedy_cartpole_t6_s1 [train_df metrics] final_return_ma: 15  strength: -6.86  max_strength: -2.86  final_strength: -10.86  sample_efficiency: 6.04227e-05  training_efficiency: 5.03523e-05  stability: -1.7972
(pid=3917) [2021-05-04 16:53:41,866 PID:6088 INFO __init__.py log_summary] Trial 5 session 2 sarsa_epsilon_greedy_cartpole_t5_s2 [train_df] epi: 434  t: 8  wall_t: 18  opt_step: 36000  frame: 30000  fps: 1666.67  total_reward: 41  total_reward_ma: 50.3333  loss: 6.48499  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:53:41,877 PID:6088 INFO __init__.py log_metrics] Trial 5 session 2 sarsa_epsilon_greedy_cartpole_t5_s2 [train_df metrics] final_return_ma: 50.3333  strength: 28.4733  max_strength: 75.14  final_strength: 19.14  sample_efficiency: 9.02482e-05  training_efficiency: 7.52068e-05  stability: -0.267351
(pid=3919) [2021-05-04 16:53:43,375 PID:6065 INFO __init__.py log_summary] Trial 4 session 1 sarsa_epsilon_greedy_cartpole_t4_s1 [train_df] epi: 1306  t: 7  wall_t: 20  opt_step: 36000  frame: 30000  fps: 1500  total_reward: 9  total_reward_ma: 69  loss: 5.27055  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:53:43,390 PID:6065 INFO __init__.py log_metrics] Trial 4 session 1 sarsa_epsilon_greedy_cartpole_t4_s1 [train_df metrics] final_return_ma: 69  strength: 47.14  max_strength: 163.14  final_strength: -12.86  sample_efficiency: 0.000109195  training_efficiency: 9.09957e-05  stability: -0.140783
(pid=3919) [2021-05-04 16:53:45,081 PID:6064 INFO __init__.py log_summary] Trial 4 session 0 sarsa_epsilon_greedy_cartpole_t4_s0 [train_df] epi: 912  t: 195  wall_t: 21  opt_step: 48000  frame: 40000  fps: 1904.76  total_reward: 200  total_reward_ma: 134.75  loss: 0.207668  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:53:45,090 PID:6064 INFO __init__.py log_metrics] Trial 4 session 0 sarsa_epsilon_greedy_cartpole_t4_s0 [train_df metrics] final_return_ma: 134.75  strength: 112.89  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 3.85608e-05  training_efficiency: 3.2134e-05  stability: 1
(pid=3927) [2021-05-04 16:53:45,479 PID:6135 INFO __init__.py log_summary] Trial 6 session 2 sarsa_epsilon_greedy_cartpole_t6_s2 [train_df] epi: 2422  t: 7  wall_t: 21  opt_step: 48000  frame: 40000  fps: 1904.76  total_reward: 13  total_reward_ma: 26.25  loss: 6.85835  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:53:45,481 PID:6133 INFO __init__.py log_summary] Trial 6 session 0 sarsa_epsilon_greedy_cartpole_t6_s0 [train_df] epi: 2237  t: 10  wall_t: 21  opt_step: 48000  frame: 40000  fps: 1904.76  total_reward: 12  total_reward_ma: 19.75  loss: 12.048  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:53:45,486 PID:6135 INFO __init__.py log_metrics] Trial 6 session 2 sarsa_epsilon_greedy_cartpole_t6_s2 [train_df metrics] final_return_ma: 26.25  strength: 4.39  max_strength: 24.14  final_strength: -8.86  sample_efficiency: 3.44058e-05  training_efficiency: 2.86715e-05  stability: -0.249054
(pid=3927) [2021-05-04 16:53:45,488 PID:6133 INFO __init__.py log_metrics] Trial 6 session 0 sarsa_epsilon_greedy_cartpole_t6_s0 [train_df metrics] final_return_ma: 19.75  strength: -2.11  max_strength: 20.14  final_strength: -9.86  sample_efficiency: 0.000130825  training_efficiency: 0.000109021  stability: -20.1268
(pid=3917) [2021-05-04 16:53:47,089 PID:6084 INFO __init__.py log_summary] Trial 5 session 1 sarsa_epsilon_greedy_cartpole_t5_s1 [train_df] epi: 518  t: 86  wall_t: 23  opt_step: 36000  frame: 30000  fps: 1304.35  total_reward: 153  total_reward_ma: 139.667  loss: 3.09447  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:53:47,100 PID:6084 INFO __init__.py log_metrics] Trial 5 session 1 sarsa_epsilon_greedy_cartpole_t5_s1 [train_df metrics] final_return_ma: 139.667  strength: 117.807  max_strength: 178.14  final_strength: 131.14  sample_efficiency: 6.9018e-05  training_efficiency: 5.7515e-05  stability: 0.397157
(pid=3917) [2021-05-04 16:53:47,256 PID:6083 INFO __init__.py log_summary] Trial 5 session 0 sarsa_epsilon_greedy_cartpole_t5_s0 [train_df] epi: 1000  t: 53  wall_t: 23  opt_step: 36000  frame: 30000  fps: 1304.35  total_reward: 200  total_reward_ma: 73  loss: 1.48546  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:53:47,268 PID:6083 INFO __init__.py log_metrics] Trial 5 session 0 sarsa_epsilon_greedy_cartpole_t5_s0 [train_df metrics] final_return_ma: 73  strength: 51.14  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 2.71086e-05  training_efficiency: 2.25905e-05  stability: 0.878641
(pid=3917) [2021-05-04 16:53:47,770 PID:6089 INFO __init__.py log_summary] Trial 5 session 3 sarsa_epsilon_greedy_cartpole_t5_s3 [train_df] epi: 527  t: 1  wall_t: 24  opt_step: 36000  frame: 30000  fps: 1250  total_reward: 9  total_reward_ma: 116.333  loss: 130.66  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:53:47,781 PID:6089 INFO __init__.py log_metrics] Trial 5 session 3 sarsa_epsilon_greedy_cartpole_t5_s3 [train_df metrics] final_return_ma: 116.333  strength: 94.4733  max_strength: 178.14  final_strength: -12.86  sample_efficiency: 7.15981e-05  training_efficiency: 5.96651e-05  stability: 0.35534
(pid=3919) [2021-05-04 16:53:47,874 PID:6066 INFO __init__.py log_summary] Trial 4 session 2 sarsa_epsilon_greedy_cartpole_t4_s2 [train_df] epi: 1155  t: 18  wall_t: 24  opt_step: 36000  frame: 30000  fps: 1250  total_reward: 17  total_reward_ma: 14  loss: 39.2763  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:53:47,886 PID:6066 INFO __init__.py log_metrics] Trial 4 session 2 sarsa_epsilon_greedy_cartpole_t4_s2 [train_df metrics] final_return_ma: 14  strength: -7.86  max_strength: -4.86  final_strength: -4.86  sample_efficiency: 6.32316e-05  training_efficiency: 5.2693e-05  stability: 0.839744
(pid=3919) [2021-05-04 16:53:47,887 PID:6067 INFO __init__.py log_summary] Trial 4 session 3 sarsa_epsilon_greedy_cartpole_t4_s3 [train_df] epi: 1215  t: 4  wall_t: 24  opt_step: 36000  frame: 30000  fps: 1250  total_reward: 15  total_reward_ma: 24  loss: 156121  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:53:47,898 PID:6067 INFO __init__.py log_metrics] Trial 4 session 3 sarsa_epsilon_greedy_cartpole_t4_s3 [train_df metrics] final_return_ma: 24  strength: 2.14  max_strength: 8.14  final_strength: -6.86  sample_efficiency: 0.00010784  training_efficiency: 8.98667e-05  stability: -0.129518
(pid=3927) [2021-05-04 16:53:48,633 PID:6136 INFO __init__.py log_summary] Trial 6 session 3 sarsa_epsilon_greedy_cartpole_t6_s3 [train_df] epi: 1921  t: 50  wall_t: 24  opt_step: 36000  frame: 30000  fps: 1250  total_reward: 55  total_reward_ma: 25.3333  loss: 20.6762  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:53:48,652 PID:6136 INFO __init__.py log_metrics] Trial 6 session 3 sarsa_epsilon_greedy_cartpole_t6_s3 [train_df metrics] final_return_ma: 25.3333  strength: 3.47333  max_strength: 33.14  final_strength: 33.14  sample_efficiency: -5.99169e-05  training_efficiency: -4.99307e-05  stability: 1
(pid=3927) [2021-05-04 16:53:49,077 PID:6134 INFO __init__.py log_summary] Trial 6 session 1 sarsa_epsilon_greedy_cartpole_t6_s1 [train_df] epi: 1670  t: 38  wall_t: 25  opt_step: 36000  frame: 30000  fps: 1200  total_reward: 83  total_reward_ma: 37.6667  loss: 3.53205  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:53:49,089 PID:6134 INFO __init__.py log_metrics] Trial 6 session 1 sarsa_epsilon_greedy_cartpole_t6_s1 [train_df metrics] final_return_ma: 37.6667  strength: 15.8067  max_strength: 61.14  final_strength: 61.14  sample_efficiency: 2.54956e-05  training_efficiency: 2.12463e-05  stability: 0.41691
(pid=3919) [2021-05-04 16:53:49,120 PID:6065 INFO __init__.py log_summary] Trial 4 session 1 sarsa_epsilon_greedy_cartpole_t4_s1 [train_df] epi: 1857  t: 2  wall_t: 25  opt_step: 48000  frame: 40000  fps: 1600  total_reward: 18  total_reward_ma: 56.25  loss: 675317  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:53:49,131 PID:6065 INFO __init__.py log_metrics] Trial 4 session 1 sarsa_epsilon_greedy_cartpole_t4_s1 [train_df metrics] final_return_ma: 56.25  strength: 34.39  max_strength: 163.14  final_strength: -3.86  sample_efficiency: 0.000111557  training_efficiency: 9.29645e-05  stability: -0.24452
(pid=3917) [2021-05-04 16:53:50,538 PID:6088 INFO __init__.py log_summary] Trial 5 session 2 sarsa_epsilon_greedy_cartpole_t5_s2 [train_df] epi: 529  t: 91  wall_t: 27  opt_step: 48000  frame: 40000  fps: 1481.48  total_reward: 12  total_reward_ma: 40.75  loss: 14.1391  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:53:50,549 PID:6088 INFO __init__.py log_metrics] Trial 5 session 2 sarsa_epsilon_greedy_cartpole_t5_s2 [train_df metrics] final_return_ma: 40.75  strength: 18.89  max_strength: 75.14  final_strength: -9.86  sample_efficiency: 9.87626e-05  training_efficiency: 8.23021e-05  stability: -0.322875
(pid=3919) [2021-05-04 16:53:50,698 PID:6064 INFO __init__.py log_summary] Trial 4 session 0 sarsa_epsilon_greedy_cartpole_t4_s0 [train_df] epi: 969  t: 196  wall_t: 27  opt_step: 60000  frame: 50000  fps: 1851.85  total_reward: 200  total_reward_ma: 147.8  loss: 0.55013  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:53:50,705 PID:6064 INFO __init__.py log_metrics] Trial 4 session 0 sarsa_epsilon_greedy_cartpole_t4_s0 [train_df metrics] final_return_ma: 147.8  strength: 125.94  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 3.331e-05  training_efficiency: 2.77583e-05  stability: 1
(pid=3927) [2021-05-04 16:53:50,984 PID:6135 INFO __init__.py log_summary] Trial 6 session 2 sarsa_epsilon_greedy_cartpole_t6_s2 [train_df] epi: 3093  t: 61  wall_t: 26  opt_step: 60000  frame: 50000  fps: 1923.08  total_reward: 25  total_reward_ma: 26  loss: 0.702367  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:53:50,992 PID:6135 INFO __init__.py log_metrics] Trial 6 session 2 sarsa_epsilon_greedy_cartpole_t6_s2 [train_df metrics] final_return_ma: 26  strength: 4.14  max_strength: 24.14  final_strength: 3.14  sample_efficiency: 3.22206e-05  training_efficiency: 2.68505e-05  stability: -0.879271
(pid=3927) [2021-05-04 16:53:51,150 PID:6133 INFO __init__.py log_summary] Trial 6 session 0 sarsa_epsilon_greedy_cartpole_t6_s0 [train_df] epi: 2826  t: 3  wall_t: 27  opt_step: 60000  frame: 50000  fps: 1851.85  total_reward: 8  total_reward_ma: 17.4  loss: 18.9916  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:53:51,160 PID:6133 INFO __init__.py log_metrics] Trial 6 session 0 sarsa_epsilon_greedy_cartpole_t6_s0 [train_df metrics] final_return_ma: 17.4  strength: -4.46  max_strength: 20.14  final_strength: -13.86  sample_efficiency: 6.19447e-05  training_efficiency: 5.16206e-05  stability: -3.02843
(pid=3919) [2021-05-04 16:53:54,782 PID:6065 INFO __init__.py log_summary] Trial 4 session 1 sarsa_epsilon_greedy_cartpole_t4_s1 [train_df] epi: 2369  t: 35  wall_t: 31  opt_step: 60000  frame: 50000  fps: 1612.9  total_reward: 45  total_reward_ma: 54  loss: 29.9116  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:53:54,789 PID:6065 INFO __init__.py log_metrics] Trial 4 session 1 sarsa_epsilon_greedy_cartpole_t4_s1 [train_df metrics] final_return_ma: 54  strength: 32.14  max_strength: 163.14  final_strength: 23.14  sample_efficiency: 9.83736e-05  training_efficiency: 8.1978e-05  stability: -0.279442
(pid=3917) [2021-05-04 16:53:55,547 PID:6084 INFO __init__.py log_summary] Trial 5 session 1 sarsa_epsilon_greedy_cartpole_t5_s1 [train_df] epi: 880  t: 2  wall_t: 32  opt_step: 48000  frame: 40000  fps: 1250  total_reward: 8  total_reward_ma: 106.75  loss: 168.905  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:53:55,558 PID:6084 INFO __init__.py log_metrics] Trial 5 session 1 sarsa_epsilon_greedy_cartpole_t5_s1 [train_df metrics] final_return_ma: 106.75  strength: 84.89  max_strength: 178.14  final_strength: -13.86  sample_efficiency: 7.08147e-05  training_efficiency: 5.90122e-05  stability: 0.210571
(pid=3917) [2021-05-04 16:53:55,738 PID:6083 INFO __init__.py log_summary] Trial 5 session 0 sarsa_epsilon_greedy_cartpole_t5_s0 [train_df] epi: 1136  t: 85  wall_t: 32  opt_step: 48000  frame: 40000  fps: 1250  total_reward: 17  total_reward_ma: 59  loss: 0.272864  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:53:55,749 PID:6083 INFO __init__.py log_metrics] Trial 5 session 0 sarsa_epsilon_greedy_cartpole_t5_s0 [train_df metrics] final_return_ma: 59  strength: 37.14  max_strength: 178.14  final_strength: -4.86  sample_efficiency: 2.71776e-05  training_efficiency: 2.2648e-05  stability: -0.212358
(pid=3919) [2021-05-04 16:53:56,302 PID:6064 INFO __init__.py log_summary] Trial 4 session 0 sarsa_epsilon_greedy_cartpole_t4_s0 [train_df] epi: 1021  t: 2  wall_t: 32  opt_step: 72000  frame: 60000  fps: 1875  total_reward: 200  total_reward_ma: 156.5  loss: 881.388  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:53:56,311 PID:6064 INFO __init__.py log_metrics] Trial 4 session 0 sarsa_epsilon_greedy_cartpole_t4_s0 [train_df metrics] final_return_ma: 156.5  strength: 134.64  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 2.96399e-05  training_efficiency: 2.46999e-05  stability: 1
(pid=3917) [2021-05-04 16:53:56,431 PID:6089 INFO __init__.py log_summary] Trial 5 session 3 sarsa_epsilon_greedy_cartpole_t5_s3 [train_df] epi: 738  t: 10  wall_t: 32  opt_step: 48000  frame: 40000  fps: 1250  total_reward: 200  total_reward_ma: 137.25  loss: 2.31759  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:53:56,442 PID:6089 INFO __init__.py log_metrics] Trial 5 session 3 sarsa_epsilon_greedy_cartpole_t5_s3 [train_df metrics] final_return_ma: 137.25  strength: 115.39  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 5.36135e-05  training_efficiency: 4.46779e-05  stability: 0.326088
(pid=3919) [2021-05-04 16:53:56,500 PID:6066 INFO __init__.py log_summary] Trial 4 session 2 sarsa_epsilon_greedy_cartpole_t4_s2 [train_df] epi: 1744  t: 16  wall_t: 33  opt_step: 48000  frame: 40000  fps: 1212.12  total_reward: 43  total_reward_ma: 21.25  loss: 2.74655  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:53:56,523 PID:6135 INFO __init__.py log_summary] Trial 6 session 2 sarsa_epsilon_greedy_cartpole_t6_s2 [train_df] epi: 3562  t: 33  wall_t: 32  opt_step: 72000  frame: 60000  fps: 1875  total_reward: 9  total_reward_ma: 23.1667  loss: 4.21795  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:53:56,530 PID:6135 INFO __init__.py log_metrics] Trial 6 session 2 sarsa_epsilon_greedy_cartpole_t6_s2 [train_df metrics] final_return_ma: 23.1667  strength: 1.30667  max_strength: 24.14  final_strength: -12.86  sample_efficiency: 5.77339e-05  training_efficiency: 4.81115e-05  stability: -1.36715
(pid=3919) [2021-05-04 16:53:56,512 PID:6066 INFO __init__.py log_metrics] Trial 4 session 2 sarsa_epsilon_greedy_cartpole_t4_s2 [train_df metrics] final_return_ma: 21.25  strength: -0.610001  max_strength: 21.14  final_strength: 21.14  sample_efficiency: 0.000394467  training_efficiency: 0.000328722  stability: 0.872774
(pid=3919) [2021-05-04 16:53:56,526 PID:6067 INFO __init__.py log_summary] Trial 4 session 3 sarsa_epsilon_greedy_cartpole_t4_s3 [train_df] epi: 1627  t: 24  wall_t: 33  opt_step: 48000  frame: 40000  fps: 1212.12  total_reward: 21  total_reward_ma: 23.25  loss: 15.2223  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:53:56,537 PID:6067 INFO __init__.py log_metrics] Trial 4 session 3 sarsa_epsilon_greedy_cartpole_t4_s3 [train_df metrics] final_return_ma: 23.25  strength: 1.39  max_strength: 8.14  final_strength: -0.860001  sample_efficiency: 0.000120654  training_efficiency: 0.000100545  stability: -1.33645
(pid=3927) [2021-05-04 16:53:56,761 PID:6133 INFO __init__.py log_summary] Trial 6 session 0 sarsa_epsilon_greedy_cartpole_t6_s0 [train_df] epi: 3418  t: 4  wall_t: 32  opt_step: 72000  frame: 60000  fps: 1875  total_reward: 14  total_reward_ma: 16.8333  loss: 6.47685  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:53:56,769 PID:6133 INFO __init__.py log_metrics] Trial 6 session 0 sarsa_epsilon_greedy_cartpole_t6_s0 [train_df metrics] final_return_ma: 16.8333  strength: -5.02667  max_strength: 20.14  final_strength: -7.86  sample_efficiency: 5.01448e-05  training_efficiency: 4.17873e-05  stability: -0.524663
(pid=3927) [2021-05-04 16:53:57,299 PID:6136 INFO __init__.py log_summary] Trial 6 session 3 sarsa_epsilon_greedy_cartpole_t6_s3 [train_df] epi: 2655  t: 32  wall_t: 33  opt_step: 48000  frame: 40000  fps: 1212.12  total_reward: 32  total_reward_ma: 27  loss: 37.1835  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:53:57,311 PID:6136 INFO __init__.py log_metrics] Trial 6 session 3 sarsa_epsilon_greedy_cartpole_t6_s3 [train_df metrics] final_return_ma: 27  strength: 5.14  max_strength: 33.14  final_strength: 10.14  sample_efficiency: -1.80367e-05  training_efficiency: -1.50305e-05  stability: -1.20729
(pid=3927) [2021-05-04 16:53:57,315 PID:6136 INFO __init__.py log_summary] Trial 6 session 3 sarsa_epsilon_greedy_cartpole_t6_s3 [train_df] epi: 2655  t: 32  wall_t: 33  opt_step: 48000  frame: 40000  fps: 1212.12  total_reward: 32  total_reward_ma: 27  loss: 37.1835  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:53:57,327 PID:6136 INFO __init__.py log_metrics] Trial 6 session 3 sarsa_epsilon_greedy_cartpole_t6_s3 [train_df metrics] final_return_ma: 27  strength: 5.14  max_strength: 33.14  final_strength: 10.14  sample_efficiency: -1.80367e-05  training_efficiency: -1.50305e-05  stability: -1.20729
(pid=3927) [2021-05-04 16:53:57,745 PID:6134 INFO __init__.py log_summary] Trial 6 session 1 sarsa_epsilon_greedy_cartpole_t6_s1 [train_df] epi: 2373  t: 7  wall_t: 33  opt_step: 48000  frame: 40000  fps: 1212.12  total_reward: 10  total_reward_ma: 30.75  loss: 3.7728  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:53:57,756 PID:6134 INFO __init__.py log_metrics] Trial 6 session 1 sarsa_epsilon_greedy_cartpole_t6_s1 [train_df metrics] final_return_ma: 30.75  strength: 8.89  max_strength: 61.14  final_strength: -11.86  sample_efficiency: 2.56609e-05  training_efficiency: 2.1384e-05  stability: -0.70814
(pid=3917) [2021-05-04 16:53:59,246 PID:6088 INFO __init__.py log_summary] Trial 5 session 2 sarsa_epsilon_greedy_cartpole_t5_s2 [train_df] epi: 1254  t: 2  wall_t: 35  opt_step: 60000  frame: 50000  fps: 1428.57  total_reward: 13  total_reward_ma: 35.2  loss: 4.70696  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:53:59,257 PID:6088 INFO __init__.py log_metrics] Trial 5 session 2 sarsa_epsilon_greedy_cartpole_t5_s2 [train_df metrics] final_return_ma: 35.2  strength: 13.34  max_strength: 75.14  final_strength: -8.86  sample_efficiency: 0.000109225  training_efficiency: 9.10207e-05  stability: -0.4955
(pid=3919) [2021-05-04 16:54:00,486 PID:6065 INFO __init__.py log_summary] Trial 4 session 1 sarsa_epsilon_greedy_cartpole_t4_s1 [train_df] epi: 2878  t: 5  wall_t: 37  opt_step: 72000  frame: 60000  fps: 1621.62  total_reward: 14  total_reward_ma: 47.3333  loss: 43.1433  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:54:00,493 PID:6065 INFO __init__.py log_metrics] Trial 4 session 1 sarsa_epsilon_greedy_cartpole_t4_s1 [train_df metrics] final_return_ma: 47.3333  strength: 25.4733  max_strength: 163.14  final_strength: -7.86  sample_efficiency: 0.000102575  training_efficiency: 8.54795e-05  stability: -0.288115
(pid=3919) [2021-05-04 16:54:01,921 PID:6064 INFO __init__.py log_summary] Trial 4 session 0 sarsa_epsilon_greedy_cartpole_t4_s0 [train_df] epi: 1078  t: 137  wall_t: 38  opt_step: 84000  frame: 70000  fps: 1842.11  total_reward: 200  total_reward_ma: 162.714  loss: 0.908814  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:54:01,929 PID:6064 INFO __init__.py log_metrics] Trial 4 session 0 sarsa_epsilon_greedy_cartpole_t4_s0 [train_df metrics] final_return_ma: 162.714  strength: 140.854  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 2.68658e-05  training_efficiency: 2.23882e-05  stability: 1
(pid=3927) [2021-05-04 16:54:02,007 PID:6135 INFO __init__.py log_summary] Trial 6 session 2 sarsa_epsilon_greedy_cartpole_t6_s2 [train_df] epi: 4233  t: 1  wall_t: 37  opt_step: 84000  frame: 70000  fps: 1891.89  total_reward: 9  total_reward_ma: 21.1429  loss: 50.4187  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:54:02,015 PID:6135 INFO __init__.py log_metrics] Trial 6 session 2 sarsa_epsilon_greedy_cartpole_t6_s2 [train_df metrics] final_return_ma: 21.1429  strength: -0.717143  max_strength: 24.14  final_strength: -12.86  sample_efficiency: -5.35695e-05  training_efficiency: -4.46412e-05  stability: -5.25
(pid=3927) [2021-05-04 16:54:02,479 PID:6133 INFO __init__.py log_summary] Trial 6 session 0 sarsa_epsilon_greedy_cartpole_t6_s0 [train_df] epi: 4103  t: 4  wall_t: 38  opt_step: 84000  frame: 70000  fps: 1842.11  total_reward: 10  total_reward_ma: 15.8571  loss: 125.019  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:54:02,490 PID:6133 INFO __init__.py log_metrics] Trial 6 session 0 sarsa_epsilon_greedy_cartpole_t6_s0 [train_df metrics] final_return_ma: 15.8571  strength: -6.00286  max_strength: 20.14  final_strength: -11.86  sample_efficiency: 4.00237e-05  training_efficiency: 3.33531e-05  stability: -0.259947
(pid=3917) [2021-05-04 16:54:03,996 PID:6084 INFO __init__.py log_summary] Trial 5 session 1 sarsa_epsilon_greedy_cartpole_t5_s1 [train_df] epi: 1575  t: 167  wall_t: 40  opt_step: 60000  frame: 50000  fps: 1250  total_reward: 200  total_reward_ma: 125.4  loss: 0.857421  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:54:04,014 PID:6084 INFO __init__.py log_metrics] Trial 5 session 1 sarsa_epsilon_greedy_cartpole_t5_s1 [train_df metrics] final_return_ma: 125.4  strength: 103.54  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 5.33294e-05  training_efficiency: 4.44412e-05  stability: 0.178348
(pid=3917) [2021-05-04 16:54:04,194 PID:6083 INFO __init__.py log_summary] Trial 5 session 0 sarsa_epsilon_greedy_cartpole_t5_s0 [train_df] epi: 1310  t: 129  wall_t: 40  opt_step: 60000  frame: 50000  fps: 1250  total_reward: 126  total_reward_ma: 72.4  loss: 0.318435  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:54:04,205 PID:6083 INFO __init__.py log_metrics] Trial 5 session 0 sarsa_epsilon_greedy_cartpole_t5_s0 [train_df metrics] final_return_ma: 72.4  strength: 50.54  max_strength: 178.14  final_strength: 104.14  sample_efficiency: 2.42196e-05  training_efficiency: 2.0183e-05  stability: -0.252019
(pid=3917) [2021-05-04 16:54:05,068 PID:6089 INFO __init__.py log_summary] Trial 5 session 3 sarsa_epsilon_greedy_cartpole_t5_s3 [train_df] epi: 868  t: 121  wall_t: 41  opt_step: 60000  frame: 50000  fps: 1219.51  total_reward: 200  total_reward_ma: 149.8  loss: 0.521539  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:54:05,079 PID:6089 INFO __init__.py log_metrics] Trial 5 session 3 sarsa_epsilon_greedy_cartpole_t5_s3 [train_df metrics] final_return_ma: 149.8  strength: 127.94  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 4.4253e-05  training_efficiency: 3.68775e-05  stability: 0.586186
(pid=3919) [2021-05-04 16:54:05,170 PID:6067 INFO __init__.py log_summary] Trial 4 session 3 sarsa_epsilon_greedy_cartpole_t4_s3 [train_df] epi: 2063  t: 1  wall_t: 41  opt_step: 60000  frame: 50000  fps: 1219.51  total_reward: 13  total_reward_ma: 21.2  loss: 545933  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:54:05,177 PID:6066 INFO __init__.py log_summary] Trial 4 session 2 sarsa_epsilon_greedy_cartpole_t4_s2 [train_df] epi: 2198  t: 35  wall_t: 41  opt_step: 60000  frame: 50000  fps: 1219.51  total_reward: 54  total_reward_ma: 27.8  loss: 8.51101  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:54:05,180 PID:6067 INFO __init__.py log_metrics] Trial 4 session 3 sarsa_epsilon_greedy_cartpole_t4_s3 [train_df metrics] final_return_ma: 21.2  strength: -0.660001  max_strength: 8.14  final_strength: -8.86  sample_efficiency: -0.000149586  training_efficiency: -0.000124655  stability: -3.13669
(pid=3919) [2021-05-04 16:54:05,189 PID:6066 INFO __init__.py log_metrics] Trial 4 session 2 sarsa_epsilon_greedy_cartpole_t4_s2 [train_df metrics] final_return_ma: 27.8  strength: 5.94  max_strength: 32.14  final_strength: 32.14  sample_efficiency: -1.07643e-05  training_efficiency: -8.97026e-06  stability: -0.229507
(pid=3927) [2021-05-04 16:54:05,936 PID:6136 INFO __init__.py log_summary] Trial 6 session 3 sarsa_epsilon_greedy_cartpole_t6_s3 [train_df] epi: 3325  t: 13  wall_t: 41  opt_step: 60000  frame: 50000  fps: 1219.51  total_reward: 15  total_reward_ma: 24.6  loss: 3.1295  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:54:05,947 PID:6136 INFO __init__.py log_metrics] Trial 6 session 3 sarsa_epsilon_greedy_cartpole_t6_s3 [train_df metrics] final_return_ma: 24.6  strength: 2.74  max_strength: 33.14  final_strength: -6.86  sample_efficiency: -3.70828e-05  training_efficiency: -3.09023e-05  stability: -0.945526
(pid=3919) [2021-05-04 16:54:06,145 PID:6065 INFO __init__.py log_summary] Trial 4 session 1 sarsa_epsilon_greedy_cartpole_t4_s1 [train_df] epi: 3383  t: 9  wall_t: 42  opt_step: 84000  frame: 70000  fps: 1666.67  total_reward: 9  total_reward_ma: 41.8571  loss: 79.8052  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:54:06,152 PID:6065 INFO __init__.py log_metrics] Trial 4 session 1 sarsa_epsilon_greedy_cartpole_t4_s1 [train_df metrics] final_return_ma: 41.8571  strength: 19.9971  max_strength: 163.14  final_strength: -12.86  sample_efficiency: 0.000110687  training_efficiency: 9.22389e-05  stability: -0.387071
(pid=3927) [2021-05-04 16:54:06,385 PID:6134 INFO __init__.py log_summary] Trial 6 session 1 sarsa_epsilon_greedy_cartpole_t6_s1 [train_df] epi: 2985  t: 1  wall_t: 42  opt_step: 60000  frame: 50000  fps: 1190.48  total_reward: 20  total_reward_ma: 28.6  loss: 20.2023  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:54:06,397 PID:6134 INFO __init__.py log_metrics] Trial 6 session 1 sarsa_epsilon_greedy_cartpole_t6_s1 [train_df metrics] final_return_ma: 28.6  strength: 6.74  max_strength: 61.14  final_strength: -1.86  sample_efficiency: 2.59733e-05  training_efficiency: 2.16444e-05  stability: -1.27784
(pid=3927) [2021-05-04 16:54:07,490 PID:6135 INFO __init__.py log_summary] Trial 6 session 2 sarsa_epsilon_greedy_cartpole_t6_s2 [train_df] epi: 4790  t: 4  wall_t: 43  opt_step: 96000  frame: 80000  fps: 1860.47  total_reward: 11  total_reward_ma: 19.875  loss: 62.0882  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:54:07,497 PID:6135 INFO __init__.py log_metrics] Trial 6 session 2 sarsa_epsilon_greedy_cartpole_t6_s2 [train_df metrics] final_return_ma: 19.875  strength: -1.985  max_strength: 24.14  final_strength: -10.86  sample_efficiency: -8.38595e-06  training_efficiency: -6.98829e-06  stability: -8.76095
(pid=3919) [2021-05-04 16:54:07,597 PID:6064 INFO __init__.py log_summary] Trial 4 session 0 sarsa_epsilon_greedy_cartpole_t4_s0 [train_df] epi: 1135  t: 47  wall_t: 44  opt_step: 96000  frame: 80000  fps: 1818.18  total_reward: 200  total_reward_ma: 167.375  loss: 0.189801  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:54:07,605 PID:6064 INFO __init__.py log_metrics] Trial 4 session 0 sarsa_epsilon_greedy_cartpole_t4_s0 [train_df metrics] final_return_ma: 167.375  strength: 145.515  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 2.46675e-05  training_efficiency: 2.05562e-05  stability: 1
(pid=3917) [2021-05-04 16:54:07,900 PID:6088 INFO __init__.py log_summary] Trial 5 session 2 sarsa_epsilon_greedy_cartpole_t5_s2 [train_df] epi: 1549  t: 133  wall_t: 44  opt_step: 72000  frame: 60000  fps: 1363.64  total_reward: 116  total_reward_ma: 48.6667  loss: 9.72111  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:54:07,911 PID:6088 INFO __init__.py log_metrics] Trial 5 session 2 sarsa_epsilon_greedy_cartpole_t5_s2 [train_df metrics] final_return_ma: 48.6667  strength: 26.8067  max_strength: 94.14  final_strength: 94.14  sample_efficiency: 5.50504e-05  training_efficiency: 4.58753e-05  stability: -0.694153
(pid=3927) [2021-05-04 16:54:08,173 PID:6133 INFO __init__.py log_summary] Trial 6 session 0 sarsa_epsilon_greedy_cartpole_t6_s0 [train_df] epi: 4589  t: 12  wall_t: 44  opt_step: 96000  frame: 80000  fps: 1818.18  total_reward: 28  total_reward_ma: 17.375  loss: 6.95463  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:54:08,181 PID:6133 INFO __init__.py log_metrics] Trial 6 session 0 sarsa_epsilon_greedy_cartpole_t6_s0 [train_df metrics] final_return_ma: 17.375  strength: -4.485  max_strength: 20.14  final_strength: 6.14  sample_efficiency: 4.47337e-05  training_efficiency: 3.72781e-05  stability: 0.0956688
(pid=3919) [2021-05-04 16:54:11,811 PID:6065 INFO __init__.py log_summary] Trial 4 session 1 sarsa_epsilon_greedy_cartpole_t4_s1 [train_df] epi: 3910  t: 10  wall_t: 48  opt_step: 96000  frame: 80000  fps: 1666.67  total_reward: 12  total_reward_ma: 38.125  loss: 263.887  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:54:11,819 PID:6065 INFO __init__.py log_metrics] Trial 4 session 1 sarsa_epsilon_greedy_cartpole_t4_s1 [train_df metrics] final_return_ma: 38.125  strength: 16.265  max_strength: 163.14  final_strength: -9.86  sample_efficiency: 0.000118127  training_efficiency: 9.84391e-05  stability: -0.514502
(pid=3917) [2021-05-04 16:54:12,416 PID:6084 INFO __init__.py log_summary] Trial 5 session 1 sarsa_epsilon_greedy_cartpole_t5_s1 [train_df] epi: 1677  t: 1  wall_t: 49  opt_step: 72000  frame: 60000  fps: 1224.49  total_reward: 200  total_reward_ma: 137.833  loss: 492.941  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:54:12,427 PID:6084 INFO __init__.py log_metrics] Trial 5 session 1 sarsa_epsilon_greedy_cartpole_t5_s1 [train_df metrics] final_return_ma: 137.833  strength: 115.973  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 4.39435e-05  training_efficiency: 3.66196e-05  stability: 0.461078
(pid=3917) [2021-05-04 16:54:12,634 PID:6083 INFO __init__.py log_summary] Trial 5 session 0 sarsa_epsilon_greedy_cartpole_t5_s0 [train_df] epi: 1513  t: 104  wall_t: 49  opt_step: 72000  frame: 60000  fps: 1224.49  total_reward: 56  total_reward_ma: 69.6667  loss: 0.24508  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:54:12,645 PID:6083 INFO __init__.py log_metrics] Trial 5 session 0 sarsa_epsilon_greedy_cartpole_t5_s0 [train_df metrics] final_return_ma: 69.6667  strength: 47.8067  max_strength: 178.14  final_strength: 34.14  sample_efficiency: 2.33207e-05  training_efficiency: 1.94339e-05  stability: -0.013059
(pid=3927) [2021-05-04 16:54:13,019 PID:6135 INFO __init__.py log_summary] Trial 6 session 2 sarsa_epsilon_greedy_cartpole_t6_s2 [train_df] epi: 5531  t: 3  wall_t: 48  opt_step: 108000  frame: 90000  fps: 1875  total_reward: 12  total_reward_ma: 19  loss: 30.5229  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:54:13,026 PID:6135 INFO __init__.py log_metrics] Trial 6 session 2 sarsa_epsilon_greedy_cartpole_t6_s2 [train_df metrics] final_return_ma: 19  strength: -2.86  max_strength: 24.14  final_strength: -9.86  sample_efficiency: -9.1738e-07  training_efficiency: -7.6448e-07  stability: -2.08564
(pid=3919) [2021-05-04 16:54:13,195 PID:6064 INFO __init__.py log_summary] Trial 4 session 0 sarsa_epsilon_greedy_cartpole_t4_s0 [train_df] epi: 1192  t: 23  wall_t: 49  opt_step: 108000  frame: 90000  fps: 1836.73  total_reward: 18  total_reward_ma: 150.778  loss: 0.144138  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:54:13,203 PID:6064 INFO __init__.py log_metrics] Trial 4 session 0 sarsa_epsilon_greedy_cartpole_t4_s0 [train_df metrics] final_return_ma: 150.778  strength: 128.918  max_strength: 178.14  final_strength: -3.86  sample_efficiency: 2.47126e-05  training_efficiency: 2.05938e-05  stability: 0.843659
(pid=3917) [2021-05-04 16:54:13,720 PID:6089 INFO __init__.py log_summary] Trial 5 session 3 sarsa_epsilon_greedy_cartpole_t5_s3 [train_df] epi: 1408  t: 35  wall_t: 50  opt_step: 72000  frame: 60000  fps: 1200  total_reward: 77  total_reward_ma: 137.667  loss: 0.846078  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:54:13,731 PID:6089 INFO __init__.py log_metrics] Trial 5 session 3 sarsa_epsilon_greedy_cartpole_t5_s3 [train_df metrics] final_return_ma: 137.667  strength: 115.807  max_strength: 178.14  final_strength: 55.14  sample_efficiency: 4.20638e-05  training_efficiency: 3.50532e-05  stability: 0.509145
(pid=3919) [2021-05-04 16:54:13,790 PID:6067 INFO __init__.py log_summary] Trial 4 session 3 sarsa_epsilon_greedy_cartpole_t4_s3 [train_df] epi: 2481  t: 10  wall_t: 50  opt_step: 72000  frame: 60000  fps: 1200  total_reward: 15  total_reward_ma: 20.1667  loss: 177.188  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:54:13,801 PID:6067 INFO __init__.py log_metrics] Trial 4 session 3 sarsa_epsilon_greedy_cartpole_t4_s3 [train_df metrics] final_return_ma: 20.1667  strength: -1.69333  max_strength: 8.14  final_strength: -6.86  sample_efficiency: -3.73327e-05  training_efficiency: -3.11105e-05  stability: -5.96969
(pid=3927) [2021-05-04 16:54:13,827 PID:6133 INFO __init__.py log_summary] Trial 6 session 0 sarsa_epsilon_greedy_cartpole_t6_s0 [train_df] epi: 5260  t: 1  wall_t: 49  opt_step: 108000  frame: 90000  fps: 1836.73  total_reward: 10  total_reward_ma: 16.5556  loss: 35.1399  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:54:13,839 PID:6133 INFO __init__.py log_metrics] Trial 6 session 0 sarsa_epsilon_greedy_cartpole_t6_s0 [train_df metrics] final_return_ma: 16.5556  strength: -5.30445  max_strength: 20.14  final_strength: -11.86  sample_efficiency: 3.63809e-05  training_efficiency: 3.03174e-05  stability: -0.560758
(pid=3919) [2021-05-04 16:54:13,812 PID:6066 INFO __init__.py log_summary] Trial 4 session 2 sarsa_epsilon_greedy_cartpole_t4_s2 [train_df] epi: 2683  t: 9  wall_t: 50  opt_step: 72000  frame: 60000  fps: 1200  total_reward: 11  total_reward_ma: 25  loss: 47.0795  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:54:13,824 PID:6066 INFO __init__.py log_metrics] Trial 4 session 2 sarsa_epsilon_greedy_cartpole_t4_s2 [train_df metrics] final_return_ma: 25  strength: 3.14  max_strength: 32.14  final_strength: -10.86  sample_efficiency: -2.65764e-05  training_efficiency: -2.2147e-05  stability: -0.548822
(pid=3927) [2021-05-04 16:54:14,580 PID:6136 INFO __init__.py log_summary] Trial 6 session 3 sarsa_epsilon_greedy_cartpole_t6_s3 [train_df] epi: 3970  t: 12  wall_t: 50  opt_step: 72000  frame: 60000  fps: 1200  total_reward: 11  total_reward_ma: 22.3333  loss: 0.836874  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:54:14,598 PID:6136 INFO __init__.py log_metrics] Trial 6 session 3 sarsa_epsilon_greedy_cartpole_t6_s3 [train_df metrics] final_return_ma: 22.3333  strength: 0.473333  max_strength: 33.14  final_strength: -10.86  sample_efficiency: -0.000242618  training_efficiency: -0.000202181  stability: -2.21168
(pid=3927) [2021-05-04 16:54:15,019 PID:6134 INFO __init__.py log_summary] Trial 6 session 1 sarsa_epsilon_greedy_cartpole_t6_s1 [train_df] epi: 3702  t: 9  wall_t: 50  opt_step: 72000  frame: 60000  fps: 1200  total_reward: 9  total_reward_ma: 25.3333  loss: 51.1418  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:54:15,030 PID:6134 INFO __init__.py log_metrics] Trial 6 session 1 sarsa_epsilon_greedy_cartpole_t6_s1 [train_df metrics] final_return_ma: 25.3333  strength: 3.47333  max_strength: 61.14  final_strength: -12.86  sample_efficiency: 3.17162e-05  training_efficiency: 2.64302e-05  stability: -1.72997
(pid=3927) [2021-05-04 16:54:15,035 PID:6134 INFO __init__.py log_summary] Trial 6 session 1 sarsa_epsilon_greedy_cartpole_t6_s1 [train_df] epi: 3702  t: 9  wall_t: 50  opt_step: 72000  frame: 60000  fps: 1200  total_reward: 9  total_reward_ma: 25.3333  loss: 51.1418  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:54:15,046 PID:6134 INFO __init__.py log_metrics] Trial 6 session 1 sarsa_epsilon_greedy_cartpole_t6_s1 [train_df metrics] final_return_ma: 25.3333  strength: 3.47333  max_strength: 61.14  final_strength: -12.86  sample_efficiency: 3.17162e-05  training_efficiency: 2.64302e-05  stability: -1.72997
(pid=3917) [2021-05-04 16:54:16,546 PID:6088 INFO __init__.py log_summary] Trial 5 session 2 sarsa_epsilon_greedy_cartpole_t5_s2 [train_df] epi: 1748  t: 120  wall_t: 53  opt_step: 84000  frame: 70000  fps: 1320.75  total_reward: 200  total_reward_ma: 70.2857  loss: 0.312924  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:54:16,557 PID:6088 INFO __init__.py log_metrics] Trial 5 session 2 sarsa_epsilon_greedy_cartpole_t5_s2 [train_df metrics] final_return_ma: 70.2857  strength: 48.4257  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 3.36278e-05  training_efficiency: 2.80232e-05  stability: 0.297438
(pid=3919) [2021-05-04 16:54:17,508 PID:6065 INFO __init__.py log_summary] Trial 4 session 1 sarsa_epsilon_greedy_cartpole_t4_s1 [train_df] epi: 4451  t: 10  wall_t: 54  opt_step: 108000  frame: 90000  fps: 1666.67  total_reward: 10  total_reward_ma: 35  loss: 1.27574e+07  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:54:17,516 PID:6065 INFO __init__.py log_metrics] Trial 4 session 1 sarsa_epsilon_greedy_cartpole_t4_s1 [train_df metrics] final_return_ma: 35  strength: 13.14  max_strength: 163.14  final_strength: -11.86  sample_efficiency: 0.000128859  training_efficiency: 0.000107383  stability: -0.644636
(pid=3919) [2021-05-04 16:54:17,519 PID:6065 INFO __init__.py log_summary] Trial 4 session 1 sarsa_epsilon_greedy_cartpole_t4_s1 [train_df] epi: 4451  t: 10  wall_t: 54  opt_step: 108000  frame: 90000  fps: 1666.67  total_reward: 10  total_reward_ma: 35  loss: 1.27574e+07  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:54:17,526 PID:6065 INFO __init__.py log_metrics] Trial 4 session 1 sarsa_epsilon_greedy_cartpole_t4_s1 [train_df metrics] final_return_ma: 35  strength: 13.14  max_strength: 163.14  final_strength: -11.86  sample_efficiency: 0.000128859  training_efficiency: 0.000107383  stability: -0.644636
(pid=3927) [2021-05-04 16:54:18,549 PID:6135 INFO __init__.py log_summary] Trial 6 session 2 sarsa_epsilon_greedy_cartpole_t6_s2 [train_df] epi: 6295  t: 7  wall_t: 54  opt_step: 120000  frame: 100000  fps: 1851.85  total_reward: 12  total_reward_ma: 18.3  loss: 2.96285  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:54:18,557 PID:6135 INFO __init__.py log_metrics] Trial 6 session 2 sarsa_epsilon_greedy_cartpole_t6_s2 [train_df metrics] final_return_ma: 18.3  strength: -3.56  max_strength: 24.14  final_strength: -9.86  sample_efficiency: 2.10637e-06  training_efficiency: 1.75531e-06  stability: -0.903651
(pid=3919) [2021-05-04 16:54:18,903 PID:6064 INFO __init__.py log_summary] Trial 4 session 0 sarsa_epsilon_greedy_cartpole_t4_s0 [train_df] epi: 1250  t: 164  wall_t: 55  opt_step: 120000  frame: 100000  fps: 1818.18  total_reward: 200  total_reward_ma: 155.7  loss: 0.0576436  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:54:18,911 PID:6064 INFO __init__.py log_metrics] Trial 4 session 0 sarsa_epsilon_greedy_cartpole_t4_s0 [train_df metrics] final_return_ma: 155.7  strength: 133.84  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 2.27543e-05  training_efficiency: 1.8962e-05  stability: 0.843139
(pid=3927) [2021-05-04 16:54:19,570 PID:6133 INFO __init__.py log_summary] Trial 6 session 0 sarsa_epsilon_greedy_cartpole_t6_s0 [train_df] epi: 6169  t: 6  wall_t: 55  opt_step: 120000  frame: 100000  fps: 1818.18  total_reward: 10  total_reward_ma: 15.9  loss: 2.19336  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:54:19,581 PID:6133 INFO __init__.py log_metrics] Trial 6 session 0 sarsa_epsilon_greedy_cartpole_t6_s0 [train_df metrics] final_return_ma: 15.9  strength: -5.96  max_strength: 20.14  final_strength: -11.86  sample_efficiency: 3.11313e-05  training_efficiency: 2.59427e-05  stability: -0.17302
(pid=3917) [2021-05-04 16:54:20,922 PID:6084 INFO __init__.py log_summary] Trial 5 session 1 sarsa_epsilon_greedy_cartpole_t5_s1 [train_df] epi: 2227  t: 67  wall_t: 57  opt_step: 84000  frame: 70000  fps: 1228.07  total_reward: 18  total_reward_ma: 120.714  loss: 1.49438  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:54:20,933 PID:6084 INFO __init__.py log_metrics] Trial 5 session 1 sarsa_epsilon_greedy_cartpole_t5_s1 [train_df metrics] final_return_ma: 120.714  strength: 98.8543  max_strength: 178.14  final_strength: -3.86  sample_efficiency: 4.41089e-05  training_efficiency: 3.67574e-05  stability: 0.337491
(pid=3917) [2021-05-04 16:54:21,157 PID:6083 INFO __init__.py log_summary] Trial 5 session 0 sarsa_epsilon_greedy_cartpole_t5_s0 [train_df] epi: 1647  t: 57  wall_t: 57  opt_step: 84000  frame: 70000  fps: 1228.07  total_reward: 11  total_reward_ma: 61.2857  loss: 2.20304  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:54:21,169 PID:6083 INFO __init__.py log_metrics] Trial 5 session 0 sarsa_epsilon_greedy_cartpole_t5_s0 [train_df metrics] final_return_ma: 61.2857  strength: 39.4257  max_strength: 178.14  final_strength: -10.86  sample_efficiency: 2.36762e-05  training_efficiency: 1.97302e-05  stability: -0.0493655
(pid=3917) [2021-05-04 16:54:21,435 PID:6089 INFO __init__.py log_summary] Trial 5 session 3 sarsa_epsilon_greedy_cartpole_t5_s3 [train_df] epi: 1969  t: 7  wall_t: 58  opt_step: 84000  frame: 70000  fps: 1206.9  total_reward: 123  total_reward_ma: 135.571  loss: 0.309378  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:54:21,446 PID:6089 INFO __init__.py log_metrics] Trial 5 session 3 sarsa_epsilon_greedy_cartpole_t5_s3 [train_df metrics] final_return_ma: 135.571  strength: 113.711  max_strength: 178.14  final_strength: 101.14  sample_efficiency: 3.85343e-05  training_efficiency: 3.21119e-05  stability: 0.548097
(pid=3919) [2021-05-04 16:54:22,569 PID:6067 INFO __init__.py log_summary] Trial 4 session 3 sarsa_epsilon_greedy_cartpole_t4_s3 [train_df] epi: 2912  t: 3  wall_t: 59  opt_step: 84000  frame: 70000  fps: 1186.44  total_reward: 45  total_reward_ma: 23.7143  loss: 1.50828e+06  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:54:22,581 PID:6067 INFO __init__.py log_metrics] Trial 4 session 3 sarsa_epsilon_greedy_cartpole_t4_s3 [train_df metrics] final_return_ma: 23.7143  strength: 1.85429  max_strength: 23.14  final_strength: 23.14  sample_efficiency: 5.46896e-05  training_efficiency: 4.55747e-05  stability: -1.26378
(pid=3919) [2021-05-04 16:54:22,601 PID:6066 INFO __init__.py log_summary] Trial 4 session 2 sarsa_epsilon_greedy_cartpole_t4_s2 [train_df] epi: 3138  t: 24  wall_t: 59  opt_step: 84000  frame: 70000  fps: 1186.44  total_reward: 36  total_reward_ma: 26.5714  loss: 58.8094  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:54:22,614 PID:6066 INFO __init__.py log_metrics] Trial 4 session 2 sarsa_epsilon_greedy_cartpole_t4_s2 [train_df metrics] final_return_ma: 26.5714  strength: 4.71143  max_strength: 32.14  final_strength: 14.14  sample_efficiency: -9.05701e-06  training_efficiency: -7.54751e-06  stability: -1.44161
(pid=3927) [2021-05-04 16:54:22,668 PID:6136 INFO __init__.py log_summary] Trial 6 session 3 sarsa_epsilon_greedy_cartpole_t6_s3 [train_df] epi: 4474  t: 28  wall_t: 58  opt_step: 84000  frame: 70000  fps: 1206.9  total_reward: 53  total_reward_ma: 26.7143  loss: 6.999  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:54:22,678 PID:6136 INFO __init__.py log_metrics] Trial 6 session 3 sarsa_epsilon_greedy_cartpole_t6_s3 [train_df metrics] final_return_ma: 26.7143  strength: 4.85429  max_strength: 33.14  final_strength: 31.14  sample_efficiency: -7.18589e-06  training_efficiency: -5.98823e-06  stability: -14.493
(pid=3927) [2021-05-04 16:54:23,305 PID:6135 INFO __init__.py log_metrics] Trial 6 session 2 sarsa_epsilon_greedy_cartpole_t6_s2 [eval_df metrics] final_return_ma: 18.3  strength: -3.56  max_strength: 24.14  final_strength: -9.86  sample_efficiency: 2.10637e-06  training_efficiency: 1.75531e-06  stability: -0.903651
(pid=3927) [2021-05-04 16:54:23,306 PID:6135 INFO logger.py info] Session 2 done
(pid=3919) [2021-05-04 16:54:23,630 PID:6064 INFO __init__.py log_metrics] Trial 4 session 0 sarsa_epsilon_greedy_cartpole_t4_s0 [eval_df metrics] final_return_ma: 155.7  strength: 133.84  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 2.27543e-05  training_efficiency: 1.8962e-05  stability: 0.843139
(pid=3919) [2021-05-04 16:54:23,631 PID:6064 INFO logger.py info] Session 0 done
(pid=3927) [2021-05-04 16:54:23,807 PID:6133 INFO __init__.py log_metrics] Trial 6 session 0 sarsa_epsilon_greedy_cartpole_t6_s0 [eval_df metrics] final_return_ma: 15.9  strength: -5.96  max_strength: 20.14  final_strength: -11.86  sample_efficiency: 3.11313e-05  training_efficiency: 2.59427e-05  stability: -0.17302
(pid=3927) [2021-05-04 16:54:23,808 PID:6133 INFO logger.py info] Session 0 done
(pid=3927) [2021-05-04 16:54:23,912 PID:6134 INFO __init__.py log_summary] Trial 6 session 1 sarsa_epsilon_greedy_cartpole_t6_s1 [train_df] epi: 4569  t: 8  wall_t: 59  opt_step: 84000  frame: 70000  fps: 1186.44  total_reward: 9  total_reward_ma: 23  loss: 4.62452  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:54:23,923 PID:6134 INFO __init__.py log_metrics] Trial 6 session 1 sarsa_epsilon_greedy_cartpole_t6_s1 [train_df metrics] final_return_ma: 23  strength: 1.14  max_strength: 61.14  final_strength: -12.86  sample_efficiency: 5.98061e-05  training_efficiency: 4.98384e-05  stability: -3.41459
(pid=3919) [2021-05-04 16:54:24,088 PID:6065 INFO __init__.py log_summary] Trial 4 session 1 sarsa_epsilon_greedy_cartpole_t4_s1 [train_df] epi: 5011  t: 5  wall_t: 60  opt_step: 120000  frame: 100000  fps: 1666.67  total_reward: 18  total_reward_ma: 33.3  loss: 22.0154  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:54:24,096 PID:6065 INFO __init__.py log_metrics] Trial 4 session 1 sarsa_epsilon_greedy_cartpole_t4_s1 [train_df metrics] final_return_ma: 33.3  strength: 11.44  max_strength: 163.14  final_strength: -3.86  sample_efficiency: 0.00013287  training_efficiency: 0.000110725  stability: -0.809572
(pid=3917) [2021-05-04 16:54:25,029 PID:6088 INFO __init__.py log_summary] Trial 5 session 2 sarsa_epsilon_greedy_cartpole_t5_s2 [train_df] epi: 2578  t: 2  wall_t: 61  opt_step: 96000  frame: 80000  fps: 1311.48  total_reward: 24  total_reward_ma: 64.5  loss: 4.87856  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:54:25,037 PID:6088 INFO __init__.py log_metrics] Trial 5 session 2 sarsa_epsilon_greedy_cartpole_t5_s2 [train_df metrics] final_return_ma: 64.5  strength: 42.64  max_strength: 178.14  final_strength: 2.14  sample_efficiency: 3.34953e-05  training_efficiency: 2.79127e-05  stability: 0.147442
(pid=3917) [2021-05-04 16:54:27,416 PID:6084 INFO __init__.py log_summary] Trial 5 session 1 sarsa_epsilon_greedy_cartpole_t5_s1 [train_df] epi: 2301  t: 176  wall_t: 64  opt_step: 96000  frame: 80000  fps: 1250  total_reward: 200  total_reward_ma: 130.625  loss: 0.751387  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:54:27,423 PID:6084 INFO __init__.py log_metrics] Trial 5 session 1 sarsa_epsilon_greedy_cartpole_t5_s1 [train_df metrics] final_return_ma: 130.625  strength: 108.765  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 3.76376e-05  training_efficiency: 3.13647e-05  stability: 0.333796
(pid=3917) [2021-05-04 16:54:27,477 PID:6089 INFO __init__.py log_summary] Trial 5 session 3 sarsa_epsilon_greedy_cartpole_t5_s3 [train_df] epi: 2475  t: 22  wall_t: 64  opt_step: 96000  frame: 80000  fps: 1250  total_reward: 11  total_reward_ma: 120  loss: 2.86133  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:54:27,489 PID:6089 INFO __init__.py log_metrics] Trial 5 session 3 sarsa_epsilon_greedy_cartpole_t5_s3 [train_df metrics] final_return_ma: 120  strength: 98.14  max_strength: 178.14  final_strength: -10.86  sample_efficiency: 3.88944e-05  training_efficiency: 3.2412e-05  stability: 0.464811
(pid=3917) [2021-05-04 16:54:28,199 PID:6083 INFO __init__.py log_summary] Trial 5 session 0 sarsa_epsilon_greedy_cartpole_t5_s0 [train_df] epi: 1753  t: 107  wall_t: 64  opt_step: 96000  frame: 80000  fps: 1250  total_reward: 29  total_reward_ma: 57.25  loss: 0.341931  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:54:28,210 PID:6083 INFO __init__.py log_metrics] Trial 5 session 0 sarsa_epsilon_greedy_cartpole_t5_s0 [train_df metrics] final_return_ma: 57.25  strength: 35.39  max_strength: 178.14  final_strength: 7.14  sample_efficiency: 2.33943e-05  training_efficiency: 1.94953e-05  stability: -0.0906588
(pid=3919) [2021-05-04 16:54:28,835 PID:6067 INFO __init__.py log_summary] Trial 4 session 3 sarsa_epsilon_greedy_cartpole_t4_s3 [train_df] epi: 3406  t: 3  wall_t: 65  opt_step: 96000  frame: 80000  fps: 1230.77  total_reward: 16  total_reward_ma: 22.75  loss: 4523.71  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:54:28,845 PID:6066 INFO __init__.py log_summary] Trial 4 session 2 sarsa_epsilon_greedy_cartpole_t4_s2 [train_df] epi: 3578  t: 14  wall_t: 65  opt_step: 96000  frame: 80000  fps: 1230.77  total_reward: 14  total_reward_ma: 25  loss: 85.0482  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:54:28,846 PID:6067 INFO __init__.py log_metrics] Trial 4 session 3 sarsa_epsilon_greedy_cartpole_t4_s3 [train_df metrics] final_return_ma: 22.75  strength: 0.889999  max_strength: 23.14  final_strength: -5.86  sample_efficiency: 8.94132e-05  training_efficiency: 7.4511e-05  stability: -3.00616
(pid=3919) [2021-05-04 16:54:28,856 PID:6066 INFO __init__.py log_metrics] Trial 4 session 2 sarsa_epsilon_greedy_cartpole_t4_s2 [train_df metrics] final_return_ma: 25  strength: 3.14  max_strength: 32.14  final_strength: -7.86  sample_efficiency: -1.58022e-05  training_efficiency: -1.31685e-05  stability: -1.06186
(pid=3919) [2021-05-04 16:54:29,118 PID:6065 INFO __init__.py log_metrics] Trial 4 session 1 sarsa_epsilon_greedy_cartpole_t4_s1 [eval_df metrics] final_return_ma: 33.3  strength: 11.44  max_strength: 163.14  final_strength: -3.86  sample_efficiency: 0.00013287  training_efficiency: 0.000110725  stability: -0.809572
(pid=3919) [2021-05-04 16:54:29,120 PID:6065 INFO logger.py info] Session 1 done
(pid=3927) [2021-05-04 16:54:29,237 PID:6136 INFO __init__.py log_summary] Trial 6 session 3 sarsa_epsilon_greedy_cartpole_t6_s3 [train_df] epi: 5171  t: 14  wall_t: 65  opt_step: 96000  frame: 80000  fps: 1230.77  total_reward: 10  total_reward_ma: 24.625  loss: 9.63054  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:54:29,244 PID:6136 INFO __init__.py log_metrics] Trial 6 session 3 sarsa_epsilon_greedy_cartpole_t6_s3 [train_df metrics] final_return_ma: 24.625  strength: 2.765  max_strength: 33.14  final_strength: -11.86  sample_efficiency: -1.77408e-05  training_efficiency: -1.4784e-05  stability: -1.56033
(pid=3927) [2021-05-04 16:54:29,685 PID:6134 INFO __init__.py log_summary] Trial 6 session 1 sarsa_epsilon_greedy_cartpole_t6_s1 [train_df] epi: 5342  t: 9  wall_t: 65  opt_step: 96000  frame: 80000  fps: 1230.77  total_reward: 9  total_reward_ma: 21.25  loss: 15.0818  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:54:29,694 PID:6134 INFO __init__.py log_metrics] Trial 6 session 1 sarsa_epsilon_greedy_cartpole_t6_s1 [train_df metrics] final_return_ma: 21.25  strength: -0.610001  max_strength: 61.14  final_strength: -12.86  sample_efficiency: -6.48569e-05  training_efficiency: -5.40475e-05  stability: -10.5288
(pid=3927) [2021-05-04 16:54:29,697 PID:6134 INFO __init__.py log_summary] Trial 6 session 1 sarsa_epsilon_greedy_cartpole_t6_s1 [train_df] epi: 5342  t: 9  wall_t: 65  opt_step: 96000  frame: 80000  fps: 1230.77  total_reward: 9  total_reward_ma: 21.25  loss: 15.0818  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:54:29,704 PID:6134 INFO __init__.py log_metrics] Trial 6 session 1 sarsa_epsilon_greedy_cartpole_t6_s1 [train_df metrics] final_return_ma: 21.25  strength: -0.610001  max_strength: 61.14  final_strength: -12.86  sample_efficiency: -6.48569e-05  training_efficiency: -5.40475e-05  stability: -10.5288
(pid=3917) [2021-05-04 16:54:30,729 PID:6088 INFO __init__.py log_summary] Trial 5 session 2 sarsa_epsilon_greedy_cartpole_t5_s2 [train_df] epi: 2682  t: 111  wall_t: 67  opt_step: 108000  frame: 90000  fps: 1343.28  total_reward: 200  total_reward_ma: 79.5556  loss: 0.681929  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:54:30,736 PID:6088 INFO __init__.py log_metrics] Trial 5 session 2 sarsa_epsilon_greedy_cartpole_t5_s2 [train_df metrics] final_return_ma: 79.5556  strength: 57.6956  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 2.5816e-05  training_efficiency: 2.15134e-05  stability: 0.152791
(pid=3917) [2021-05-04 16:54:32,767 PID:6084 INFO __init__.py log_summary] Trial 5 session 1 sarsa_epsilon_greedy_cartpole_t5_s1 [train_df] epi: 2403  t: 1  wall_t: 69  opt_step: 108000  frame: 90000  fps: 1304.35  total_reward: 137  total_reward_ma: 131.333  loss: 172.692  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:54:32,774 PID:6084 INFO __init__.py log_metrics] Trial 5 session 1 sarsa_epsilon_greedy_cartpole_t5_s1 [train_df metrics] final_return_ma: 131.333  strength: 109.473  max_strength: 178.14  final_strength: 115.14  sample_efficiency: 3.45377e-05  training_efficiency: 2.87814e-05  stability: 0.397784
(pid=3917) [2021-05-04 16:54:32,914 PID:6089 INFO __init__.py log_summary] Trial 5 session 3 sarsa_epsilon_greedy_cartpole_t5_s3 [train_df] epi: 2645  t: 6  wall_t: 69  opt_step: 108000  frame: 90000  fps: 1304.35  total_reward: 66  total_reward_ma: 114  loss: 229.09  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:54:32,924 PID:6089 INFO __init__.py log_metrics] Trial 5 session 3 sarsa_epsilon_greedy_cartpole_t5_s3 [train_df metrics] final_return_ma: 114  strength: 92.14  max_strength: 178.14  final_strength: 44.14  sample_efficiency: 3.74155e-05  training_efficiency: 3.11796e-05  stability: 0.457408
(pid=3917) [2021-05-04 16:54:33,652 PID:6083 INFO __init__.py log_summary] Trial 5 session 0 sarsa_epsilon_greedy_cartpole_t5_s0 [train_df] epi: 1901  t: 179  wall_t: 70  opt_step: 108000  frame: 90000  fps: 1285.71  total_reward: 200  total_reward_ma: 73.1111  loss: 0.833828  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:54:33,659 PID:6083 INFO __init__.py log_metrics] Trial 5 session 0 sarsa_epsilon_greedy_cartpole_t5_s0 [train_df metrics] final_return_ma: 73.1111  strength: 51.2511  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 1.86505e-05  training_efficiency: 1.55421e-05  stability: -0.0631535
(pid=3919) [2021-05-04 16:54:34,298 PID:6066 INFO __init__.py log_summary] Trial 4 session 2 sarsa_epsilon_greedy_cartpole_t4_s2 [train_df] epi: 4001  t: 2  wall_t: 70  opt_step: 108000  frame: 90000  fps: 1285.71  total_reward: 35  total_reward_ma: 26.1111  loss: 2.78408e+06  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:54:34,305 PID:6066 INFO __init__.py log_metrics] Trial 4 session 2 sarsa_epsilon_greedy_cartpole_t4_s2 [train_df metrics] final_return_ma: 26.1111  strength: 4.25111  max_strength: 32.14  final_strength: 13.14  sample_efficiency: -6.55907e-06  training_efficiency: -5.46589e-06  stability: -1.70701
(pid=3919) [2021-05-04 16:54:34,342 PID:6067 INFO __init__.py log_summary] Trial 4 session 3 sarsa_epsilon_greedy_cartpole_t4_s3 [train_df] epi: 4119  t: 7  wall_t: 70  opt_step: 108000  frame: 90000  fps: 1285.71  total_reward: 14  total_reward_ma: 21.7778  loss: 2.48157  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:54:34,349 PID:6067 INFO __init__.py log_metrics] Trial 4 session 3 sarsa_epsilon_greedy_cartpole_t4_s3 [train_df metrics] final_return_ma: 21.7778  strength: -0.0822228  max_strength: 23.14  final_strength: -7.86  sample_efficiency: -0.000742276  training_efficiency: -0.000618563  stability: -6.58427
(pid=3927) [2021-05-04 16:54:34,524 PID:6136 INFO __init__.py log_summary] Trial 6 session 3 sarsa_epsilon_greedy_cartpole_t6_s3 [train_df] epi: 5895  t: 5  wall_t: 70  opt_step: 108000  frame: 90000  fps: 1285.71  total_reward: 9  total_reward_ma: 22.8889  loss: 69.1189  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:54:34,531 PID:6136 INFO __init__.py log_metrics] Trial 6 session 3 sarsa_epsilon_greedy_cartpole_t6_s3 [train_df metrics] final_return_ma: 22.8889  strength: 1.02889  max_strength: 33.14  final_strength: -12.86  sample_efficiency: -5.78095e-05  training_efficiency: -4.81745e-05  stability: -2.9783
(pid=3927) [2021-05-04 16:54:35,013 PID:6134 INFO __init__.py log_summary] Trial 6 session 1 sarsa_epsilon_greedy_cartpole_t6_s1 [train_df] epi: 6177  t: 7  wall_t: 70  opt_step: 108000  frame: 90000  fps: 1285.71  total_reward: 39  total_reward_ma: 23.2222  loss: 13.1122  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:54:35,023 PID:6134 INFO __init__.py log_metrics] Trial 6 session 1 sarsa_epsilon_greedy_cartpole_t6_s1 [train_df metrics] final_return_ma: 23.2222  strength: 1.36222  max_strength: 61.14  final_strength: 17.14  sample_efficiency: 4.13497e-05  training_efficiency: 3.44581e-05  stability: -17.8524
(pid=3917) [2021-05-04 16:54:36,339 PID:6088 INFO __init__.py log_summary] Trial 5 session 2 sarsa_epsilon_greedy_cartpole_t5_s2 [train_df] epi: 3083  t: 10  wall_t: 72  opt_step: 120000  frame: 100000  fps: 1388.89  total_reward: 10  total_reward_ma: 72.6  loss: 8.52186  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:54:36,350 PID:6088 INFO __init__.py log_metrics] Trial 5 session 2 sarsa_epsilon_greedy_cartpole_t5_s2 [train_df metrics] final_return_ma: 72.6  strength: 50.74  max_strength: 178.14  final_strength: -11.86  sample_efficiency: 2.61857e-05  training_efficiency: 2.18214e-05  stability: 0.0775334
(pid=3917) [2021-05-04 16:54:36,354 PID:6088 INFO __init__.py log_summary] Trial 5 session 2 sarsa_epsilon_greedy_cartpole_t5_s2 [train_df] epi: 3083  t: 10  wall_t: 72  opt_step: 120000  frame: 100000  fps: 1388.89  total_reward: 10  total_reward_ma: 72.6  loss: 8.52186  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:54:36,364 PID:6088 INFO __init__.py log_metrics] Trial 5 session 2 sarsa_epsilon_greedy_cartpole_t5_s2 [train_df metrics] final_return_ma: 72.6  strength: 50.74  max_strength: 178.14  final_strength: -11.86  sample_efficiency: 2.61857e-05  training_efficiency: 2.18214e-05  stability: 0.0775334
(pid=3917) [2021-05-04 16:54:38,126 PID:6084 INFO __init__.py log_summary] Trial 5 session 1 sarsa_epsilon_greedy_cartpole_t5_s1 [train_df] epi: 2614  t: 60  wall_t: 74  opt_step: 120000  frame: 100000  fps: 1351.35  total_reward: 56  total_reward_ma: 123.8  loss: 1.64251  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:54:38,133 PID:6084 INFO __init__.py log_metrics] Trial 5 session 1 sarsa_epsilon_greedy_cartpole_t5_s1 [train_df metrics] final_return_ma: 123.8  strength: 101.94  max_strength: 178.14  final_strength: 34.14  sample_efficiency: 3.37159e-05  training_efficiency: 2.80966e-05  stability: 0.385949
(pid=3917) [2021-05-04 16:54:38,334 PID:6089 INFO __init__.py log_summary] Trial 5 session 3 sarsa_epsilon_greedy_cartpole_t5_s3 [train_df] epi: 2815  t: 7  wall_t: 74  opt_step: 120000  frame: 100000  fps: 1351.35  total_reward: 200  total_reward_ma: 122.6  loss: 0.631643  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:54:38,342 PID:6089 INFO __init__.py log_metrics] Trial 5 session 3 sarsa_epsilon_greedy_cartpole_t5_s3 [train_df metrics] final_return_ma: 122.6  strength: 100.74  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 3.25676e-05  training_efficiency: 2.71397e-05  stability: 0.486289
(pid=3917) [2021-05-04 16:54:39,308 PID:6083 INFO __init__.py log_summary] Trial 5 session 0 sarsa_epsilon_greedy_cartpole_t5_s0 [train_df] epi: 1971  t: 50  wall_t: 75  opt_step: 120000  frame: 100000  fps: 1333.33  total_reward: 15  total_reward_ma: 67.3  loss: 1.24413  lr: 0.05  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3917) [2021-05-04 16:54:39,321 PID:6083 INFO __init__.py log_metrics] Trial 5 session 0 sarsa_epsilon_greedy_cartpole_t5_s0 [train_df metrics] final_return_ma: 67.3  strength: 45.44  max_strength: 178.14  final_strength: -6.86  sample_efficiency: 1.87811e-05  training_efficiency: 1.56509e-05  stability: -0.0536357
(pid=3919) [2021-05-04 16:54:40,071 PID:6066 INFO __init__.py log_summary] Trial 4 session 2 sarsa_epsilon_greedy_cartpole_t4_s2 [train_df] epi: 4388  t: 32  wall_t: 76  opt_step: 120000  frame: 100000  fps: 1315.79  total_reward: 15  total_reward_ma: 25  loss: 107.492  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:54:40,085 PID:6066 INFO __init__.py log_metrics] Trial 4 session 2 sarsa_epsilon_greedy_cartpole_t4_s2 [train_df metrics] final_return_ma: 25  strength: 3.14  max_strength: 32.14  final_strength: -6.86  sample_efficiency: -1.01768e-05  training_efficiency: -8.48063e-06  stability: -1.30005
(pid=3919) [2021-05-04 16:54:40,115 PID:6067 INFO __init__.py log_summary] Trial 4 session 3 sarsa_epsilon_greedy_cartpole_t4_s3 [train_df] epi: 4644  t: 30  wall_t: 76  opt_step: 120000  frame: 100000  fps: 1315.79  total_reward: 39  total_reward_ma: 23.5  loss: 40.428  lr: 0.01  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3919) [2021-05-04 16:54:40,123 PID:6067 INFO __init__.py log_metrics] Trial 4 session 3 sarsa_epsilon_greedy_cartpole_t4_s3 [train_df metrics] final_return_ma: 23.5  strength: 1.64  max_strength: 23.14  final_strength: 17.14  sample_efficiency: 4.39444e-05  training_efficiency: 3.66203e-05  stability: -71.9724
(pid=3927) [2021-05-04 16:54:40,163 PID:6136 INFO __init__.py log_summary] Trial 6 session 3 sarsa_epsilon_greedy_cartpole_t6_s3 [train_df] epi: 6583  t: 69  wall_t: 76  opt_step: 120000  frame: 100000  fps: 1315.79  total_reward: 44  total_reward_ma: 25  loss: 0.995696  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:54:40,172 PID:6136 INFO __init__.py log_metrics] Trial 6 session 3 sarsa_epsilon_greedy_cartpole_t6_s3 [train_df metrics] final_return_ma: 25  strength: 3.14  max_strength: 33.14  final_strength: 22.14  sample_efficiency: -9.99731e-06  training_efficiency: -8.33108e-06  stability: -8.50325
(pid=3917) [2021-05-04 16:54:40,353 PID:6088 INFO __init__.py log_metrics] Trial 5 session 2 sarsa_epsilon_greedy_cartpole_t5_s2 [eval_df metrics] final_return_ma: 72.6  strength: 50.74  max_strength: 178.14  final_strength: -11.86  sample_efficiency: 2.61857e-05  training_efficiency: 2.18214e-05  stability: 0.0775334
(pid=3917) [2021-05-04 16:54:40,355 PID:6088 INFO logger.py info] Session 2 done
(pid=3927) [2021-05-04 16:54:40,770 PID:6134 INFO __init__.py log_summary] Trial 6 session 1 sarsa_epsilon_greedy_cartpole_t6_s1 [train_df] epi: 6705  t: 7  wall_t: 76  opt_step: 120000  frame: 100000  fps: 1315.79  total_reward: 19  total_reward_ma: 22.8  loss: 0.850267  lr: 0.1  explore_var: 0.05  entropy_coef: nan  entropy: nan  grad_norm: nan
(pid=3927) [2021-05-04 16:54:40,782 PID:6134 INFO __init__.py log_metrics] Trial 6 session 1 sarsa_epsilon_greedy_cartpole_t6_s1 [train_df metrics] final_return_ma: 22.8  strength: 0.939999  max_strength: 61.14  final_strength: -2.86  sample_efficiency: 5.0888e-05  training_efficiency: 4.24066e-05  stability: -8.1354
(pid=3917) [2021-05-04 16:54:42,296 PID:6084 INFO __init__.py log_metrics] Trial 5 session 1 sarsa_epsilon_greedy_cartpole_t5_s1 [eval_df metrics] final_return_ma: 123.8  strength: 101.94  max_strength: 178.14  final_strength: 34.14  sample_efficiency: 3.37159e-05  training_efficiency: 2.80966e-05  stability: 0.385949
(pid=3917) [2021-05-04 16:54:42,297 PID:6084 INFO logger.py info] Session 1 done
(pid=3917) [2021-05-04 16:54:42,509 PID:6089 INFO __init__.py log_metrics] Trial 5 session 3 sarsa_epsilon_greedy_cartpole_t5_s3 [eval_df metrics] final_return_ma: 122.6  strength: 100.74  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 3.25676e-05  training_efficiency: 2.71397e-05  stability: 0.486289
(pid=3917) [2021-05-04 16:54:42,510 PID:6089 INFO logger.py info] Session 3 done
(pid=3917) [2021-05-04 16:54:43,518 PID:6083 INFO __init__.py log_metrics] Trial 5 session 0 sarsa_epsilon_greedy_cartpole_t5_s0 [eval_df metrics] final_return_ma: 67.3  strength: 45.44  max_strength: 178.14  final_strength: -6.86  sample_efficiency: 1.87811e-05  training_efficiency: 1.56509e-05  stability: -0.0536357
(pid=3917) [2021-05-04 16:54:43,520 PID:6083 INFO logger.py info] Session 0 done
(pid=3919) [2021-05-04 16:54:44,048 PID:6066 INFO __init__.py log_metrics] Trial 4 session 2 sarsa_epsilon_greedy_cartpole_t4_s2 [eval_df metrics] final_return_ma: 25  strength: 3.14  max_strength: 32.14  final_strength: -6.86  sample_efficiency: -1.01768e-05  training_efficiency: -8.48063e-06  stability: -1.30005
(pid=3919) [2021-05-04 16:54:44,048 PID:6067 INFO __init__.py log_metrics] Trial 4 session 3 sarsa_epsilon_greedy_cartpole_t4_s3 [eval_df metrics] final_return_ma: 23.5  strength: 1.64  max_strength: 23.14  final_strength: 17.14  sample_efficiency: 4.39444e-05  training_efficiency: 3.66203e-05  stability: -71.9724
(pid=3919) [2021-05-04 16:54:44,049 PID:6066 INFO logger.py info] Session 2 done
(pid=3927) [2021-05-04 16:54:44,077 PID:6136 INFO __init__.py log_metrics] Trial 6 session 3 sarsa_epsilon_greedy_cartpole_t6_s3 [eval_df metrics] final_return_ma: 25  strength: 3.14  max_strength: 33.14  final_strength: 22.14  sample_efficiency: -9.99731e-06  training_efficiency: -8.33108e-06  stability: -8.50325
(pid=3927) [2021-05-04 16:54:44,077 PID:6136 INFO logger.py info] Session 3 done
(pid=3919) [2021-05-04 16:54:44,050 PID:6067 INFO logger.py info] Session 3 done
(pid=3927) [2021-05-04 16:54:44,672 PID:6134 INFO __init__.py log_metrics] Trial 6 session 1 sarsa_epsilon_greedy_cartpole_t6_s1 [eval_df metrics] final_return_ma: 22.8  strength: 0.939999  max_strength: 61.14  final_strength: -2.86  sample_efficiency: 5.0888e-05  training_efficiency: 4.24066e-05  stability: -8.1354
(pid=3927) [2021-05-04 16:54:44,673 PID:6134 INFO logger.py info] Session 1 done
Result for ray_trainable_5_agent.0.net.optim_spec.lr=0.05,trial_index=5:
  date: 2021-05-04_16-54-46
  done: false
  experiment_id: 8ffcfbad795a4beea99cf2e3c64c9215
  hostname: furanzu
  iterations_since_restore: 1
  node_ip: 220.67.127.75
  pid: 3917
  time_since_restore: 83.9136130809784
  time_this_iter_s: 83.9136130809784
  time_total_s: 83.9136130809784
  timestamp: 1620114886
  timesteps_since_restore: 0
  training_iteration: 1
  trial_data:
    '5':
      agent.0.net.optim_spec.lr: 0.05
      consistency: -1.2436954030885268
      final_return_ma: 96.57500076293945
      final_strength: 48.38999938964844
      max_strength: 178.13999938964844
      sample_efficiency: 2.7812577627628343e-05
      stability: 0.2240338921546936
      strength: 74.71500301361084
      training_efficiency: 2.3177149614639347e-05

== Status ==
Using FIFO scheduling algorithm.
Resources requested: 12/16 CPUs, 0/2 GPUs
Memory usage on this node: 5.2/33.6 GB
Result logdir: /home/iwan/ray_results/sarsa_epsilon_greedy_cartpole
Number of trials: 7 ({'TERMINATED': 4, 'RUNNING': 3})
RUNNING trials:
 - ray_trainable_4_agent.0.net.optim_spec.lr=0.01,trial_index=4:    RUNNING
 - ray_trainable_5_agent.0.net.optim_spec.lr=0.05,trial_index=5:    RUNNING, [4 CPUs, 0 GPUs], [pid=3917], 83 s, 1 iter
 - ray_trainable_6_agent.0.net.optim_spec.lr=0.1,trial_index=6: RUNNING
TERMINATED trials:
 - ray_trainable_0_agent.0.net.optim_spec.lr=0.0005,trial_index=0:  TERMINATED, [4 CPUs, 0 GPUs], [pid=3926], 101 s, 1 iter
 - ray_trainable_1_agent.0.net.optim_spec.lr=0.001,trial_index=1:   TERMINATED, [4 CPUs, 0 GPUs], [pid=3913], 101 s, 1 iter
 - ray_trainable_2_agent.0.net.optim_spec.lr=0.001,trial_index=2:   TERMINATED, [4 CPUs, 0 GPUs], [pid=3925], 101 s, 1 iter
 - ray_trainable_3_agent.0.net.optim_spec.lr=0.005,trial_index=3:   TERMINATED, [4 CPUs, 0 GPUs], [pid=3914], 102 s, 1 iter

(pid=3917) [2021-05-04 16:54:46,943 PID:3917 INFO logger.py info] Trial 5 done
Result for ray_trainable_4_agent.0.net.optim_spec.lr=0.01,trial_index=4:
  date: 2021-05-04_16-54-47
  done: false
  experiment_id: 15bdd3fe8a5348959d51f1b0ce8697eb
  hostname: furanzu
  iterations_since_restore: 1
  node_ip: 220.67.127.75
  pid: 3919
  time_since_restore: 84.52337956428528
  time_this_iter_s: 84.52337956428528
  time_total_s: 84.52337956428528
  timestamp: 1620114887
  timesteps_since_restore: 0
  training_iteration: 1
  trial_data:
    '4':
      agent.0.net.optim_spec.lr: 0.01
      consistency: -2.955836353229779
      final_return_ma: 59.374999046325684
      final_strength: 46.13999938964844
      max_strength: 99.13999938964844
      sample_efficiency: 4.734792105409724e-05
      stability: -18.30973031371832
      strength: 37.5149986743927
      training_efficiency: 3.945659932469425e-05

(pid=3919) [2021-05-04 16:54:47,500 PID:3919 INFO logger.py info] Trial 4 done
Result for ray_trainable_6_agent.0.net.optim_spec.lr=0.1,trial_index=6:
  date: 2021-05-04_16-54-48
  done: false
  experiment_id: 4da3531cb0aa45c1be14d4b3c595e945
  hostname: furanzu
  iterations_since_restore: 1
  node_ip: 220.67.127.75
  pid: 3927
  time_since_restore: 84.44849443435669
  time_this_iter_s: 84.44849443435669
  time_total_s: 84.44849443435669
  timestamp: 1620114888
  timesteps_since_restore: 0
  training_iteration: 1
  trial_data:
    '6':
      agent.0.net.optim_spec.lr: 0.1
      consistency: 17.4483187290332
      final_return_ma: 20.499999523162842
      final_strength: -0.6100006103515625
      max_strength: 34.63999938964844
      sample_efficiency: 1.8532071067056677e-05
      stability: -4.428830206394196
      strength: -1.3600005954504013
      training_efficiency: 1.544339670545014e-05

(pid=3927) [2021-05-04 16:54:48,028 PID:3927 INFO logger.py info] Trial 6 done
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/2 GPUs
Memory usage on this node: 4.0/33.6 GB
Result logdir: /home/iwan/ray_results/sarsa_epsilon_greedy_cartpole
Number of trials: 7 ({'TERMINATED': 7})
TERMINATED trials:
 - ray_trainable_0_agent.0.net.optim_spec.lr=0.0005,trial_index=0:  TERMINATED, [4 CPUs, 0 GPUs], [pid=3926], 101 s, 1 iter
 - ray_trainable_1_agent.0.net.optim_spec.lr=0.001,trial_index=1:   TERMINATED, [4 CPUs, 0 GPUs], [pid=3913], 101 s, 1 iter
 - ray_trainable_2_agent.0.net.optim_spec.lr=0.001,trial_index=2:   TERMINATED, [4 CPUs, 0 GPUs], [pid=3925], 101 s, 1 iter
 - ray_trainable_3_agent.0.net.optim_spec.lr=0.005,trial_index=3:   TERMINATED, [4 CPUs, 0 GPUs], [pid=3914], 102 s, 1 iter
 - ray_trainable_4_agent.0.net.optim_spec.lr=0.01,trial_index=4:    TERMINATED, [4 CPUs, 0 GPUs], [pid=3919], 84 s, 1 iter
 - ray_trainable_5_agent.0.net.optim_spec.lr=0.05,trial_index=5:    TERMINATED, [4 CPUs, 0 GPUs], [pid=3917], 83 s, 1 iter
 - ray_trainable_6_agent.0.net.optim_spec.lr=0.1,trial_index=6: TERMINATED, [4 CPUs, 0 GPUs], [pid=3927], 84 s, 1 iter

[2021-05-04 16:54:52,408 PID:3860 INFO analysis.py analyze_experiment] All experiment data zipped to data/sarsa_epsilon_greedy_cartpole_2021_05_04_165139.zip
[2021-05-04 16:54:52,408 PID:3860 INFO logger.py info] Experiment done
kengz commented 2 years ago

related to #451. This is now resolved on version 4.2.4.