关于experience replay的问题

initial-h / AlphaZero_Gomoku_MPI

An asynchronous/parallel method of AlphaGo Zero algorithm with Gomoku

185 stars 43 forks source link

Closed Sunshine-718 closed 1 month ago

Sunshine-718 commented 1 month ago

你好，我了解到AlphaZero好像是一个on-policy算法，on-policy算法是不适用experience replay的，但是我又在代码里面看到使用了experience replay, 我想知道我的看法是不是对的。 https://github.com/initial-h/AlphaZero_Gomoku_MPI/blob/95867cb7e524ebe9c77a926c82091785693c3c0a/train.py#L122-L125

initial-h commented 1 month ago

是off-policy的

Sunshine-718 commented 1 month ago

好的，谢谢大佬