Formula/code discrepancy in Chapter 4?

mimoralea / gdrl

Grokking Deep Reinforcement Learning

https://www.manning.com/books/grokking-deep-reinforcement-learning

BSD 3-Clause "New" or "Revised" License

812 stars 234 forks source link

Formula/code discrepancy in Chapter 4? #18

Closed steveant closed 2 years ago

steveant commented 2 years ago

Really loving this book!! Doing plenty of reading and re-reading not to miss a beat.

I noticed a formula vs. code discrepancy in Chapter 4 - Upper Confidence Bound (UCB) equation formula where hyperparameter 'c' is outside the square root in the equation:

but inside the square root in the code:

mimoralea commented 2 years ago

Hey, @steveant. Great find! Have you tested the code with the c outside the sqrt? That should be the correct implementation!