ntucllab / striatum

Contextual bandit in python
BSD 2-Clause "Simplified" License
111 stars 37 forks source link

Exp4.P cannot handle delay reward #70

Open ianlini opened 8 years ago

ianlini commented 8 years ago

Maybe we should use the score in history, but not store it in model.

yangarbiter commented 8 years ago

This algorithm is originally not designed for delayed reward.

ianlini commented 8 years ago

linucb is also not, but we modify it to make it do it...

ianlini commented 8 years ago

The current implementation of exp3 also doesn't support delay reward

ianlini commented 8 years ago

108 is the solution for exp3