A fixed epsilon, the constant factor in a simple epsilon-greedy strategy, prevents the strategy from getting arbitrarily close to the optimal lever.
To solve this problem, some paper suggest natural variant of epsilon-greedy strategy, which is called "epsilon-decreasing strategy". (Ref1(Auer et al.), Ref2(Vermorel et al.))
It may be good to add an option of decreasing epsilon to the epsilon-greedy method.
A fixed epsilon, the constant factor in a simple epsilon-greedy strategy, prevents the strategy from getting arbitrarily close to the optimal lever.
To solve this problem, some paper suggest natural variant of epsilon-greedy strategy, which is called "epsilon-decreasing strategy". (Ref1(Auer et al.), Ref2(Vermorel et al.))
It may be good to add an option of
decreasing epsilon
to the epsilon-greedy method.