Open ianlini opened 8 years ago
Maybe we should use the score in history, but not store it in model.
This algorithm is originally not designed for delayed reward.
linucb is also not, but we modify it to make it do it...
The current implementation of exp3 also doesn't support delay reward
Maybe we should use the score in history, but not store it in model.