inoryy / reaver

Reaver: Modular Deep Reinforcement Learning Framework. Focused on StarCraft II. Supports Gym, Atari, and MuJoCo.
MIT License
554 stars 89 forks source link

Separate std variable for continuous policies #15

Closed inoryy closed 5 years ago

inoryy commented 5 years ago

Seems using a single, separate variable for (log?) standard deviation is more popular than making it part of the network, e.g. (Schulman et al., 2015). Should probably use this way instead of currently implemented, at least while comparing algorithms against baselines.

Can't use tf.get_variable() though, goes away in 2.0.