pkel / cpr

consensus protocol research
9 stars 2 forks source link

Simulator failure when training Tailstorm with k=2 #21

Open pkel opened 2 years ago

pkel commented 2 years ago

Observed this error after running make train-online on 247c11f.

## Environment (before vectorization) ##
Tailstorm with k=2, constant rewards, and optimal sub-block selection; SSZ'16-like attack space; α=0.25 attacker
public_blocks: 0
private_blocks: 0
diff_blocks: 0
public_votes: 1
private_votes_inclusive: 2
private_votes_exclusive: 1
public_depth: 0
private_depth_inclusive: 1
private_depth_exclusive: 1
event: 2
Actions: (0) Adopt_Prolong | (1) Override_Prolong | (2) Match_Prolong | (3) Wait_Prolong | (4) Adopt_Proceed | (5) Override_Proceed | (6) Match_Proceed | (7) Wait_Proceed
## Training ##
Using cpu device
-----------------------------------
| rollout/           |            |
|    ep_len_mean     | 248        |
|    ep_rew_mean     | 0.63059205 |
| time/              |            |
|    fps             | 10568      |
|    iterations      | 1          |
|    time_elapsed    | 23         |
|    total_timesteps | 245760     |
-----------------------------------
Process ForkServerProcess-20:
Traceback (most recent call last):
  File "/usr/lib64/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib64/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 29, in _worker
    observation, reward, done, info = env.step(data)
  File "/home/patrik/devel/cpr/python/gym/cpr_gym/wrappers.py", line 208, in step
    obs, reward, done, was_info = self.env.step(action)
  File "/home/patrik/devel/cpr/python/gym/cpr_gym/wrappers.py", line 184, in step
    obs, reward, done, info = self.env.step(action)
  File "/home/patrik/devel/cpr/python/gym/cpr_gym/wrappers.py", line 159, in step
    obs, reward, done, info = self.env.step(action)
  File "/home/patrik/devel/cpr/python/gym/cpr_gym/wrappers.py", line 84, in step
    obs, reward, done, info = self.env.step(action)
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/gym/wrappers/order_enforcing.py", line 11, in step
    observation, reward, done, info = self.env.step(action)
  File "/home/patrik/devel/cpr/python/gym/cpr_gym/envs.py", line 47, in step
    obs, r, d, i = engine.step(self.ocaml_env, a)
  File "ocaml/gym/bridge.ml", line 105, in Dune__exe__Bridge.(fun):105
  File "ocaml/gym/engine.ml", line 183, in Dune__exe__Engine.of_module.step:183
  File "ocaml/protocols/tailstorm_ssz.ml", line 293, in Cpr_protocols__Tailstorm_ssz.Make.Agent.apply:293
  File "ocaml/protocols/tailstorm.ml", line 519, in Cpr_protocols__Tailstorm.Make.Honest.next_summary':519
  File "ocaml/protocols/tailstorm.ml", line 415, in Cpr_protocols__Tailstorm.Make.Honest.optimal_quorum:415
  File "ocaml/protocols/combinatorics.ml", line 17, in Cpr_protocols__Combinatorics.n_choose_k:17
ValueError: (Division_by_zero)
Traceback (most recent call last):
  File "/home/patrik/devel/cpr/python/train/ppo.py", line 315, in <module>
    model.learn(
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/ppo/ppo.py", line 314, in learn
    return super().learn(
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 251, in learn
    continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 185, in collect_rollouts
    if callback.on_step() is False:
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/callbacks.py", line 88, in on_step
    return self._on_step()
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/callbacks.py", line 192, in _on_step
    continue_training = callback.on_step() and continue_training
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/callbacks.py", line 88, in on_step
    return self._on_step()
  File "/home/patrik/devel/cpr/python/train/ppo.py", line 232, in _on_step
    r = super()._on_step()
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/callbacks.py", line 435, in _on_step
    episode_rewards, episode_lengths = evaluate_policy(
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/evaluation.py", line 87, in evaluate_policy
    observations, rewards, dones, infos = env.step(actions)
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 162, in step
    return self.step_wait()
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/vec_env/vec_monitor.py", line 76, in step_wait
    obs, rewards, dones, infos = self.venv.step_wait()
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 120, in step_wait
    results = [remote.recv() for remote in self.remotes]
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 120, in <listcomp>
    results = [remote.recv() for remote in self.remotes]
  File "/usr/lib64/python3.9/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/usr/lib64/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib64/python3.9/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError