Explore literature - Githubissues

johnantonn commented 3 years ago

Currently implemented methods, i.e. e-greedy, UCB are taken from [Sutton and Barto, 2018], ch. 2.

johnantonn commented 3 years ago

[Sutton and Barto, 2018] Gittins-index, p. 65 of the pdf. Instance of Bayesian methods which assumes a knows initial distribution over the action values and then update the distribution exactly after each step. Then one can select actions according to their posterior probability of being the best action.

johnantonn commented 3 years ago

Studied chapters 1 - 12 from Lattimore's Bandit Algorithms. Rather complex theoretical background that tries to bound solutions using theorems from statistics and measure theory.

The book by Alexandrs Slivkins seems more compact and to the point, without adding unwanted complexity and deals with the same subjects. Probably gonna switch to that one and base the background material on that, too. This book includes a chapter on "Knapsack Bandits" which seems ideal to our setting, i.e. budgeted constraint bandit in a pure exploration environment.

Content on Bayesian bandits and Thompson sampling also included and seems relevant.

johnantonn / cash-for-unsupervised-ad

Explore literature #13