Open mryellow opened 9 years ago
This might work a little better, falling off quickly on the low end, instead of forward reward the instant walls are considered "clear".
if(this.actionix === 0 && proximity_reward > 0.2) forward_reward = 0.1 * Math.sqrt(proximity_reward-0.2);
edit: Actually probably behaves better the other way, sqrt
will squeeze through some pretty small gaps though.
if(this.actionix === 0) forward_reward = 0.1 * Math.pow(proximity_reward, 2);
I'm finding generally that dropout (regardless of uncertainty implemented or not) will become obsessed with any conditional reward which jumps up/down out of nowhere.
For instance halving forward reward for forward turns:
if (this.actionix === 0 || this.actionix === 1 || this.actionix === 2) {
forward_reward = whatever number;
if (this.actionix === 1 || this.actionix === 2) {
forward_reward = forward_reward / 2;
}
}
Dropout will find itself hard up against a wall, looking along it, exploiting what it can from the half forward reward. Smoothly distributed rewards on the other hand will be exploited without so much unexpected behavior.
http://mlg.eng.cam.ac.uk/yarin/blog_3d801aa532c1ce.html
That's the
> 0.75
threshold for forward reward. With a few eyes missing the walls, the overall proximity drops more it does in most cases, if the agent can get a little bonus for forward at that stage it will take it.https://github.com/yaringal/DropoutUncertaintyDemos/blob/14fa4689bcf29e280bf3bb5c967f8bf10e530178/convnetjs/rldemo_comparison.js#L368
Generally I've found the threshold still works ok, takes tweaking but is kind of a "this is a doorway you'll accept" vs "that's a little too risky" in the end. Thinking the best bet would be to remove it and punish harder on walls some other way, so the forward bonus can't win out against walls when multiplied by those last few decimal points of the proximity being fed in.