initial-h / AlphaZero_Gomoku_MPI

An asynchronous/parallel method of AlphaGo Zero algorithm with Gomoku
185 stars 43 forks source link

关于experience replay的问题 #46

Closed Sunshine-718 closed 1 month ago

Sunshine-718 commented 1 month ago

你好,我了解到AlphaZero好像是一个on-policy算法,on-policy算法是不适用experience replay的,但是我又在代码里面看到使用了experience replay, 我想知道我的看法是不是对的。 https://github.com/initial-h/AlphaZero_Gomoku_MPI/blob/95867cb7e524ebe9c77a926c82091785693c3c0a/train.py#L122-L125

initial-h commented 1 month ago

是off-policy的

Sunshine-718 commented 1 month ago

好的,谢谢大佬