Closed haoyusoong closed 6 years ago
I may have figured it out. "2 .actor-critic training" is the step3 of algorithm 2 in the paper .