Closed zetian1025 closed 1 month ago
Hi @zetian1025 , did you find a good source for this question? I've been wondering about the same thing
Apologies for the delay in answering, will try to check the issues more often in the future.
What @zetian1025 proposed sounds correct. The idea is to select n (e.g. 1000) test prompts, and perform a forward step for all of these, calculating the KL divergence between the current policy and the initial (reference) one, as well as the score for each prompt, and averaging over all n prompts.
Am I right: