werner-duvaud / muzero-general

MuZero
https://github.com/werner-duvaud/muzero-general/wiki/MuZero-Documentation
MIT License
2.47k stars 606 forks source link

Unhandled errors in replay_buffer.py #111

Closed schwab closed 3 years ago

schwab commented 3 years ago

Recently while training I started getting the following errors.

2021-01-11 16:34:00,823.ERROR worker.py:980 -- Possible unhandled error from worker: ray::ReplayBuffer.get_batch() (pid=3149092, ip=192.168.1.175)
  File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor
  File "/home/mcstar_dev/project/muzero-general/replay_buffer.py", line 83, in get_batch
    game_id, game_history, game_prob = self.sample_game()
  File "/home/mcstar_dev/project/muzero-general/replay_buffer.py", line 149, in sample_game
    game_index = numpy.random.choice(len(self.buffer), p=game_probs)
  File "mtrand.pyx", line 928, in numpy.random.mtrand.RandomState.choice
ValueError: probabilities contain NaN
2021-01-11 16:34:20,339.ERROR worker.py:980 -- Possible unhandled error from worker: ray::Trainer.continuous_update_weights() (pid=3149069, ip=192.168.1.175)
  File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor
  File "/home/mcstar_dev/project/muzero-general/trainer.py", line 71, in continuous_update_weights
    index_batch, batch = ray.get(next_batch)
ray.exceptions.RayTaskError(ValueError): ray::ReplayBuffer.get_batch() (pid=3149092, ip=192.168.1.175)
  File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor
  File "/home/mcstar_dev/project/muzero-general/replay_buffer.py", line 83, in get_batch
    game_id, game_history, game_prob = self.sample_game()
  File "/home/mcstar_dev/project/muzero-general/replay_buffer.py", line 149, in sample_game
    game_index = numpy.random.choice(len(self.buffer), p=game_probs)
  File "mtrand.pyx", line 928, in numpy.random.mtrand.RandomState.choice
ValueError: probabilities contain NaN

My current replay buffer settings are....

   ### Replay Buffer
        self.replay_buffer_size = 200  # Number of self-play games to keep in the replay buffer
        self.num_unroll_steps = 50  # Number of game moves to keep for every batch element
        self.td_steps = 90  # Number of steps in the future to take into account for calculating the target value
        self.PER = True  # Prioritized Replay (See paper appendix Training), select in priority the elements in the replay buffer which are unexpected for the network
        self.PER_alpha = 0.99  # How much prioritization is used, 0 corresponding to the uniform case, paper suggests 1

        # Reanalyze (See paper appendix Reanalyse)
        self.use_last_model_value = True  # Use the last model to provide a fresher, stable n-step value (See paper appendix Reanalyze)
        self.reanalyse_on_gpu = False
schwab commented 3 years ago

Note, after the above error, every Loss value is NaN, so the training is effectively dead at that point.

schwab commented 3 years ago

This problem went away when I reduced the learning rate to something reasonable (like .005).

Follow up: I've also learned that in my particular problem space, the game play speed muzero is learning is realtime and I only have 2 instances of the game engine to learn on at a time. Thus, in the beginning, the replay buffer only has a few games to analyze. Over time, the bufffer fills, so setting the batch_size to something like 100 or more helps prevent the training loop from overtraining on the small number of available games in the early training epochs.