Closed takuseno closed 6 years ago
seems to learn something
reference (DQN performance)
beat DQN, but unstable
Great works! We might need to plot a reward graph not by tensorboard simply because it's unreliable.
seems to learn something
reference (DQN performance)