Closed minimental closed 1 year ago
Thanks for the kind words. I really appreciate it.
It's been a while since I worked these out, but from what I remember, the greedy strategy never does any exploration to try other things than what the agent knows as the best action. So it never has a chance to find out there's a better strategy. In some sense the best action is meant relative to the limited knowledge of the "world" the greedy agent has.
Hi Vojta,
thank you for the quick reply, and please excuse the delay. It took me a while to understand your reasoning/your formula for epsilon > 0, and I came to the same conclusion. Likelihood for choosing the best action for epsilon > 0 is
1 - (1-1/n) * epsilon,
where n is the number of actions, and epsilon is the frequency for picking actions at random.
My only issue really is that the above formula is invalid for epsilon = 0. If you solely rely on greedy choices, the likelihood of choosing the best action is actually pretty involved. See this article for an upper bound.
Dear Vojta,
thank you for publishing your solutions! Great move. Most likely, it is due to my lack of understanding of statistics, but in your solution to exercise 2.3 you are saying that the probability of picking the best action is
p(Ai = a*) = 1 - (1 - 1/n) epsilon
But if epsilon = 0 (greedy), the probability of picking the best action is 1, according to that. Due to the random nature of picking an action using the greedy algorithm (and being unable to change to a better one), I would assume the probability to be considerably less than 1.