Reinforcement Learning with negative rewards?

karpathy / convnetjs

Deep Learning in Javascript. Train Convolutional Neural Networks (or ordinary ones) in your browser.

MIT License

10.88k stars 2.04k forks source link

I wrote a very simple simulation to test the Reinforcement Learning Module. I only set up the current action as input, and the output is "left" or "right". Going right feeds the reward 1 back into the network while going left returns the reward -1.

To my astonishment, returning an hour later after I let it train, the 'creature' was moving very confidently to the left, and only to the left! Goes without saying, the average reward of the network was negative! What could be an explanation for this?

Regarding the setup of the network, I basically copied all settings from your apples/poison example - including the layer defs.

karpathy / convnetjs

Reinforcement Learning with negative rewards? #59