Open paulbricman opened 1 year ago
Relevant function for sanitization: https://github.com/paulbricman/DebateGPT/blob/relative-evaluation/debategpt/training/reward.py#L137
Implementing the sanitization function caused the scores of many propositions to drop to zero, including many which are mostly well-formatted but contain extraneous punctuation such as colons and quotation marks. This results in the sum of the scores of the two parties adding to less than one, and in many cases both parties got a final score of zero. I am wondering if it would make sense to relax the sanitization requirements, for example by allowing more types of punctuation, then re-normalizing the scores.
Given two model names (could be the same, e.g.
distilgpt2
), load them and use them each to "power" one of two parties engaged in debate, using theDebate
object. It might require a bit of messing around with theDebate
object, though, becuase it has been designed with one model in mind. For instance, the new function could work with two such objects which are manually kept in sync, each powered by one of the model names.The function should return a list of the party ratings for each of n_branches debates, something like
[[0.4, 0.6], [0.7, 0.3]]
. It'll be straightforward to then interpret those in a more meaningful way. I think it'd be appropriate to also sanitize the scores, as described in the artifact (i.e. setting individual utterance ratings to zero if they fail to satisfy a few cosmetic constraints).Relevant artifact sections: ArgRank, Obtaining DebateGPT