yandexdataschool / Practical_RL

A course in reinforcement learning in the wild
The Unlicense
5.87k stars 1.68k forks source link

fix regret calculation in week5 #514

Closed laktionov closed 1 year ago

laktionov commented 1 year ago

In the function get_regret the line optimal_reward - reward does not correspond to the definition of regret (expected optimal reward - expected reward of the chosen action) Moreover, since optimal_reward is the number less or equal than 1 and reward itself is discrete (0 or 1) in the existed implementation reward can be non-monotonic.