Open jendelel opened 5 years ago
Currently, I sample from GaussianPolicy. That probably isn't correct. Perhaps the continuous action extension of DeepCoach should be more like DDPG.
Currently, I sample from GaussianPolicy. That probably isn't correct. Perhaps the continuous action extension of DeepCoach should be more like DDPG.