Closed ryanjulian closed 6 years ago
The baseline should be optimized before the policies, because the augmented reward and return values depend on network parameters
The baseline should be optimized before the policies, because the augmented reward and return values depend on network parameters