Negotiation task RL accuracy.

Hi. I am attempting to run the shared codebase. I was testing the negotiation task given in the paper with a language model of my choice (LLAMA2). I am not sure how to obtain the RL Model accuracy (given in the main paper Fig. 4 bottom). I ran negotiation/selfplay.py and am unable to understand which of the following metric corresponds to RL model accuracy :

" 250: dialog_len=3.78 sent_len=1.00 agree=94.00% advantage=-0.49 pareto=82.98 time=0.015s comb_rew=13.09 alice_rew=6.32 alice_sel=44.40% alice_unique=11 alice_novelty=0.73 alice_diversity=0.97 bob_rew=6.78 bob_sel=55.60% bob_unique=14 bob_novelty=0.76 bob_diversity=0.98 full_match=0.58"

minaek / reward_design_with_llms

Negotiation task RL accuracy. #3