Closed laktionov closed 1 year ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
I've looked into your proposed changes more closely (and checked a few online sources for regret definitions) and I realize my mistake. One source that cited the exact same difference in definitions is here, and the author specifically mentions that this definition of regret is problematic because each term is no longer non-negative (what you described through non-monotonicity).
Apologies for earlier misunderstanding. I'm merging the PR now.
In the function get_regret the line optimal_reward - reward does not correspond to the definition of regret (expected optimal reward - expected reward of the chosen action) Moreover, since optimal_reward is the number less or equal than 1 and reward itself is discrete (0 or 1) in the existed implementation reward can be non-monotonic.