ryanjulian / embed2learn

Embedding to Learn
9 stars 5 forks source link

Fix baseline optimization order #38

Closed ryanjulian closed 6 years ago

ryanjulian commented 6 years ago

The baseline should be optimized before the policies, because the augmented reward and return values depend on network parameters