werner-duvaud / muzero-general

MuZero
https://github.com/werner-duvaud/muzero-general/wiki/MuZero-Documentation
MIT License
2.47k stars 606 forks source link

Absorbing states? #129

Closed ziofil closed 3 years ago

ziofil commented 3 years ago

Shouldn't one use an artificial target "policy" made of all zeros here? Then the gradients would be zero, right? 🤔

https://github.com/werner-duvaud/muzero-general/blob/ed3fc8a4532bd4afe564c29c2374e27c0e17544e/replay_buffer.py#L272

werner-duvaud commented 3 years ago

Hi,

I think that this part is not detailed in the paper, I am interested if there are similar things in other papers.

We have made the target policy uniform for the absorbing states to avoid biasing the model towards a particular distribution. But we also experimented with only 0's and it didn't make any noticeable difference in performance so I don't know which is best.