karpathy / convnetjs

Deep Learning in Javascript. Train Convolutional Neural Networks (or ordinary ones) in your browser.
MIT License
10.88k stars 2.04k forks source link

Reinforcement Learning with negative rewards? #59

Open cosmin-novac opened 8 years ago

cosmin-novac commented 8 years ago

I wrote a very simple simulation to test the Reinforcement Learning Module. I only set up the current action as input, and the output is "left" or "right". Going right feeds the reward 1 back into the network while going left returns the reward -1.

To my astonishment, returning an hour later after I let it train, the 'creature' was moving very confidently to the left, and only to the left! Goes without saying, the average reward of the network was negative! What could be an explanation for this?

Regarding the setup of the network, I basically copied all settings from your apples/poison example - including the layer defs.

taralloc commented 8 years ago

Welcome to the world of QLearning with function approximation! Especially neural networks. Convergence is not guaranteed, and indeed it doesn't converge more often than it does. A lot of parameter tweaking is necessary. It would be awesome if someone with more experience could share some insight on why this happens and how to improve the algorithm.