Closed will-maclean closed 2 weeks ago
Fixed a buffer bug killing probe env 2 and higher, probe env 1-4 now work
Have discovered, if using relu and hidden size = 1, training can be impossible
Note: in smaller networks, esp if using the advantage DQN, sigmoid seems more stable than ReLU
Not working well, not sure why