Closed mw66 closed 8 years ago
oh:
random.randint(0, self.num_actions-1)
DeepMind uses null action in their code and evaluations, so I would stick with this. Other option is "human starts", which was introduced in Gorila paper. You are welcome to submit pull request for this.
30: reward = self.env.act(0)
right now all the action is fixed to 1st action
how about random.randint(0, self.num_actions)?