ntucllab / striatum

Contextual bandit in python
BSD 2-Clause "Simplified" License
111 stars 37 forks source link

default p_min #64

Open ianlini opened 8 years ago

ianlini commented 8 years ago

https://github.com/ntucllab/striatum/blob/master/striatum/bandit/exp4p.py#L64 What's this? Any reference?

yangarbiter commented 8 years ago

It should be this one https://github.com/ntucllab/libact/blob/master/libact/query_strategies/active_learning_by_learning.py#L337

ianlini commented 8 years ago

How do we define T?

taweihuang commented 8 years ago

The problem here is that we do not know the exact value of N and T for initialization. T is the total times of recommendation, while N is the number of experts.

ianlini commented 8 years ago

I think we should fix N. What happens if we have more than T rounds?

ianlini commented 8 years ago

I think the actions and experts should both be fixed... I don't think Exp4.P can handle changes of actions and experts reasonably... This is a big change, any idea? @yangarbiter @stegben @SoluMilken

taweihuang commented 8 years ago

Yeah, the original EXP4P cannot handle new actions and experts. For new actions, if we retrain our experts, I think it's still okay. But for new experts, I think the original algorithm could not handle this case.

ianlini commented 8 years ago

After retraining the experts, I don't think the weight can still work, and the new weight of a new action is also a problem.