The reward function encourages short episodes

MHamza-Y commented 3 years ago

I am trying to train a reinforcement learning algorithm to control basal rate using the given gym environment. The problem with reward function is it encourages as short episode as possible. I have tried different algorithms and hyper parameters variations. But the policy always learns to either output 0 or max basal value. To avoid accumulating any more penalty because of the long episode. Can the reward function be improved somehow?

lorenzobrigato commented 2 years ago

Any updates on this? Or solutions? I am also having some issues with different algorithms and hyper-parameters and experiencing similar behavior.

jxx123 commented 1 year ago

The documentation has a section showing how to use a custom reward function, https://github.com/jxx123/simglucose#openai-gym-usage, which serves exactly your purpose of tuning the reward function.

The default reward function is not intended to give you a nice reward (especially long-term reward), and you are supposed to define your own reward function.

But for the prosperity, it will be nice if anyone could share their insights and their carefully designed reward functions here. I could collect them and put them in the documentation for visibility (of course show your name to give you the credit).

jxx123 / simglucose

The reward function encourages short episodes #40