Closed joon0503 closed 5 years ago
From the simulation, it seems when dueling network is used agent learns faster.
However, after ~5000episodes output value of the network converges to 0, and gradient vanishes.
From the simulation, it seems when dueling network is used agent learns faster.
However, after ~5000episodes output value of the network converges to 0, and gradient vanishes.