kfoofw / bandit_simulations

Bandit algorithms simulations for online learning
79 stars 33 forks source link

The lower alpha I set , the higher clickrate it return #1

Open hcygeorge opened 3 years ago

hcygeorge commented 3 years ago

Hi, I have tried your linUCB disjoint implementation, and I found that the lower alpha I set , the higher ctr rate it return.

When alpha = 0.01, the cumulate click rate almost converge to 0.9. I guess something wrongs with the dataset since lower alpha means it nearly give up exploration.

Any idea of how this happened and how to fix it?

kfoofw commented 3 years ago

Hi @hcygeorge, sorry for the late reply and it's been a long time since I re-visited this repo.

My hypothesis is that the dataset was created with a simulation of having 10 different contextual arms (thus a data generation process originating with 10 different contextual arms with very little "noise"). I also suspect that with the ridge regression methodology, the model was able to easily find the best arm for each contexts easily, and thus, having a lower alpha means less exploration of other arms and more exploitation of arms that have already been doing well.

In doing so, it allowed the system to reach a CTR of 0.90 for a similar range of timesteps (I am presuming that your run was about 800 to 1100 steps).

Let me know if that clarifies!

kfoofw commented 3 years ago

I actually found the dataset that was part of a Homework assignment of a ML class in Columbia University. For your reference, here's the link to the homework assignment.

http://www.cs.columbia.edu/~jebara/6998/hw2.pdf