In the function get_regret the line optimal_reward - reward does not correspond to the definition of regret (expected optimal reward - expected reward of the chosen action)
Moreover, since optimal_reward is the number less or equal than 1 and reward itself is discrete (0 or 1) in the existed implementation reward can be non-monotonic.
In the function
get_regret
the lineoptimal_reward - reward
does not correspond to the definition of regret (expected optimal reward - expected reward of the chosen action) Moreover, since optimal_reward is the number less or equal than 1 and reward itself is discrete (0 or 1) in the existed implementation reward can be non-monotonic.