yandexdataschool / Practical_RL

A course in reinforcement learning in the wild
The Unlicense
5.87k stars 1.68k forks source link

fix regret calculation in week5 #515

Closed laktionov closed 1 year ago

laktionov commented 1 year ago

In the function get_regret the line optimal_reward - reward does not correspond to the definition of regret (expected optimal reward - expected reward of the chosen action) Moreover, since optimal_reward is the number less or equal than 1 and reward itself is discrete (0 or 1) in the existed implementation reward can be non-monotonic.

review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

dniku commented 1 year ago

I've looked into your proposed changes more closely (and checked a few online sources for regret definitions) and I realize my mistake. One source that cited the exact same difference in definitions is here, and the author specifically mentions that this definition of regret is problematic because each term is no longer non-negative (what you described through non-monotonicity).

Apologies for earlier misunderstanding. I'm merging the PR now.