Closed WUHU-G closed 1 year ago
vpred == values??
@FastBCSD vpred
and values
actually represent different concepts. In this implementation, values
refers to the estimated values from the old policy, while vpred
refers to the value estimated by the new policy. The value loss is indeed calculated as the squared error term between returns
and the newly estimated values vpred
. However, returns
itself is also derived from old values and the newly estimated advantage. For further details, you can refer to Section 3.3 of the paper.