Observed this error after running make train-online on 247c11f.
## Environment (before vectorization) ##
Tailstorm with k=2, constant rewards, and optimal sub-block selection; SSZ'16-like attack space; α=0.25 attacker
public_blocks: 0
private_blocks: 0
diff_blocks: 0
public_votes: 1
private_votes_inclusive: 2
private_votes_exclusive: 1
public_depth: 0
private_depth_inclusive: 1
private_depth_exclusive: 1
event: 2
Actions: (0) Adopt_Prolong | (1) Override_Prolong | (2) Match_Prolong | (3) Wait_Prolong | (4) Adopt_Proceed | (5) Override_Proceed | (6) Match_Proceed | (7) Wait_Proceed
## Training ##
Using cpu device
-----------------------------------
| rollout/ | |
| ep_len_mean | 248 |
| ep_rew_mean | 0.63059205 |
| time/ | |
| fps | 10568 |
| iterations | 1 |
| time_elapsed | 23 |
| total_timesteps | 245760 |
-----------------------------------
Process ForkServerProcess-20:
Traceback (most recent call last):
File "/usr/lib64/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib64/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 29, in _worker
observation, reward, done, info = env.step(data)
File "/home/patrik/devel/cpr/python/gym/cpr_gym/wrappers.py", line 208, in step
obs, reward, done, was_info = self.env.step(action)
File "/home/patrik/devel/cpr/python/gym/cpr_gym/wrappers.py", line 184, in step
obs, reward, done, info = self.env.step(action)
File "/home/patrik/devel/cpr/python/gym/cpr_gym/wrappers.py", line 159, in step
obs, reward, done, info = self.env.step(action)
File "/home/patrik/devel/cpr/python/gym/cpr_gym/wrappers.py", line 84, in step
obs, reward, done, info = self.env.step(action)
File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/gym/wrappers/order_enforcing.py", line 11, in step
observation, reward, done, info = self.env.step(action)
File "/home/patrik/devel/cpr/python/gym/cpr_gym/envs.py", line 47, in step
obs, r, d, i = engine.step(self.ocaml_env, a)
File "ocaml/gym/bridge.ml", line 105, in Dune__exe__Bridge.(fun):105
File "ocaml/gym/engine.ml", line 183, in Dune__exe__Engine.of_module.step:183
File "ocaml/protocols/tailstorm_ssz.ml", line 293, in Cpr_protocols__Tailstorm_ssz.Make.Agent.apply:293
File "ocaml/protocols/tailstorm.ml", line 519, in Cpr_protocols__Tailstorm.Make.Honest.next_summary':519
File "ocaml/protocols/tailstorm.ml", line 415, in Cpr_protocols__Tailstorm.Make.Honest.optimal_quorum:415
File "ocaml/protocols/combinatorics.ml", line 17, in Cpr_protocols__Combinatorics.n_choose_k:17
ValueError: (Division_by_zero)
Traceback (most recent call last):
File "/home/patrik/devel/cpr/python/train/ppo.py", line 315, in <module>
model.learn(
File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/ppo/ppo.py", line 314, in learn
return super().learn(
File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 251, in learn
continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 185, in collect_rollouts
if callback.on_step() is False:
File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/callbacks.py", line 88, in on_step
return self._on_step()
File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/callbacks.py", line 192, in _on_step
continue_training = callback.on_step() and continue_training
File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/callbacks.py", line 88, in on_step
return self._on_step()
File "/home/patrik/devel/cpr/python/train/ppo.py", line 232, in _on_step
r = super()._on_step()
File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/callbacks.py", line 435, in _on_step
episode_rewards, episode_lengths = evaluate_policy(
File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/evaluation.py", line 87, in evaluate_policy
observations, rewards, dones, infos = env.step(actions)
File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 162, in step
return self.step_wait()
File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/vec_env/vec_monitor.py", line 76, in step_wait
obs, rewards, dones, infos = self.venv.step_wait()
File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 120, in step_wait
results = [remote.recv() for remote in self.remotes]
File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 120, in <listcomp>
results = [remote.recv() for remote in self.remotes]
File "/usr/lib64/python3.9/multiprocessing/connection.py", line 255, in recv
buf = self._recv_bytes()
File "/usr/lib64/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes
buf = self._recv(4)
File "/usr/lib64/python3.9/multiprocessing/connection.py", line 388, in _recv
raise EOFError
EOFError
Observed this error after running
make train-online
on 247c11f.