ucl-dark / llm_debate

Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"
https://llm-debate.com/
MIT License
84 stars 11 forks source link

about <u_quote></u_quote> #2

Open tqzhong opened 3 months ago

tqzhong commented 3 months ago

Hi I have some issues about the tag . You mentioned in your prompt that tag will transfer to when the quotes don't pass verification through direct string matching. Sometimes, LLMs (like GPT4) may select quotes from stories that do not exactly match the original strings, but their meanings are consistent. In such cases, would directly tagging these strings with an 'u_quote' tag affect the final judgment of the judge?

jplhughes commented 3 months ago

We do normalise the text by stripping punctuation and making lowercase. This catches many of the issues where the LLM might not give the exact quote. However, you're right that sometimes the small differences that are meaningless will make the quote get marked as unverified and this does affect the final judgement. We decided that we're ok with this, since as debaters get stronger (e.g. better base model or higher BoN) they should be incentivised to use more accurate quotes. Since BoN uses the judge prompt, if the debater use slightly wrong quotes then the judge won't be as persuaded by that argument. So BoN will incentivise verified quotes.

tqzhong commented 3 months ago

Thanks for your kind explanation! I have another issue: I understand that the motivation lies in the ability of the debate judge model to effectively enhance the accuracy of LLM responses under conditions where the background (stories) is unknown. Therefore, I have two points of confusion:

  1. Between the debate judge system and a single LLM that knows the background, which one has more advantages (higher accuracy)
  2. In the judge model's prompt where the background is not given, would providing relevant background information as further assistance potentially increase the accuracy of the judge model Thanks again!
jplhughes commented 3 months ago

Between the debate judge system and a single LLM that knows the background, which one has more advantages (higher accuracy)

If you look at our "expert" baseline in Figure 1, that shows the accuracy if the judge is provided with the story. It is of course higher, but we're using this toy setting with information asymmetry to understand how a non-expert could supervise stronger models in the future when we don't have ground truth labels. In that setting, you wouldn't be able to calculate an accuracy on your task. So then the question is how do we supervise such a system to get better with humans. And debate is one way that could be possible.

In the judge model's prompt where the background is not given, would providing relevant background information as further assistance potentially increase the accuracy of the judge model

We provide background information such as the fact they are in a debate setting, they will see arguments from two debaters and they are arguing over a question with two answer choices. Providing context here does help the judge accuracy, and also explaining the quoting system helps too. We have prompt ablations in the Appendix if you're interested in other prompt engineering we did. If by background information you mean the story, then see my answer to your first question.

tqzhong commented 3 months ago

Thanks a lot. I am now considering applying your system in court debates, where 'stories' would be understood as various legal statues. Then, the prosecutor and the lawyer would debate, so logically, the judge would also understand these 'legal statues (stories)'. Therefore, I am considering that the judge should be able to see both the 'debate transcript' and the 'stories'.

If you look at our "expert" baseline in Figure 1, that shows the accuracy if the judge is provided with the story.

So, what you meant earlier is that the expert in Figure 1 means the judge see both the transcript and stories (Previously, I understood that the judge only saw the stories, without the context of the debate )

jplhughes commented 3 months ago

That is cool! Keep us in the loop. Expert in figure 1 is just the judge seeing the story. It doesn't see the debate. We don't run the baseline of story + debates, which is what you want to run. I would expect that this won't help accuracy since the debates will confuse the LLM more compared to just working it out via the story (see Appendix C on our write up on self-improvement with debates). Remember that our primary motivation for our work is scalable oversight so in the future the judge won't get access to the information / capability of the expert models that are debating. But for your use-case (if you don't care about scalable oversight) it seems interesting to try.

tqzhong commented 3 months ago

Of course and thanks again !