Open kerala21 opened 6 months ago
The jpg file is unavailable.
I was also a bit confused by that part. As I understand it, r/N in the paper seems to be a typo—actually, it should be Q + (r - Q)/N. This is because, to calculate the estimated score Q, we need to update the difference between the predicted Q and the observed reward r.
If so, Q + (r - Q)/N can be rewritten as:
((N - 1)Q + r)/N
This represents the average of all the rewards obtained.
self.scores[i]
stores the total sum of all scores (rewards) so far. It will then be divided by counts (to calculate the average) in get_scores()
when calculating ucb_scores
.
Q(p) for each prompt in the UCB algorithm of the paper is updated to Q(p) + r/N(p),
![Uploading 2024331203750.jpg…]()
The following table describes the project update code
def update(self, chosen, scores):
Doesn't match