in my opinion, the way choosing an action to expand is purely based on the prior probability or just random choice.
So why there is a while loop? I don't quite understand the "if" condintions in the loop, especially this one:
if self.children_visit_count[action] > 0 and count < 10: count += 1 continue
in my opinion, the way choosing an action to expand is purely based on the prior probability or just random choice. So why there is a while loop? I don't quite understand the "if" condintions in the loop, especially this one:
if self.children_visit_count[action] > 0 and count < 10: count += 1 continue