nazaruka / gym-http-api

NSGA2-based Sonic agent + experimental code
MIT License
1 stars 1 forks source link

Stop dead-end learning #39

Open schrum2 opened 5 years ago

schrum2 commented 5 years ago

Sometimes the starting point of the weights is so bad that every learning update has a reward of 0. The agent is not learning anything ... it's just wasting time. We already terminate eval early if the agent is still too long. During learning, we should also stop early if some number of learning updates in a row have 0 reward (some zeroes are ok, but not only 0). Make it a command line parameter.

schrum2 commented 5 years ago

You take this issue and I'll deal with the behavior archive. This issue just requires a counter in the learn function that exits the outer loop if the updates keep having 0 reward.

schrum2 commented 5 years ago

This seems to work fine.

schrum2 commented 5 years ago

It might make more sense to base this on an unchanging x-position than on repeated rewards of 0. Keeping this open to consider that approach ... we could actually have two different conditions for ending learning early.

Specifically, add a parameter for how many learning steps in a row it is ok for the x coordinate to be unchanging. Then, count this number in the learn method and break out of the main loop if the count exceeds the threshold.