paulbricman / DebateGPT

Implementation of initial ArgRank and DebateGPT prototypes.
https://paulbricman.com/defensibility/
MIT License
0 stars 0 forks source link

Implement relative debate evaluation between models #3

Open paulbricman opened 1 year ago

paulbricman commented 1 year ago

Given two model names (could be the same, e.g. distilgpt2), load them and use them each to "power" one of two parties engaged in debate, using the Debate object. It might require a bit of messing around with the Debate object, though, becuase it has been designed with one model in mind. For instance, the new function could work with two such objects which are manually kept in sync, each powered by one of the model names.

The function should return a list of the party ratings for each of n_branches debates, something like [[0.4, 0.6], [0.7, 0.3]]. It'll be straightforward to then interpret those in a more meaningful way. I think it'd be appropriate to also sanitize the scores, as described in the artifact (i.e. setting individual utterance ratings to zero if they fail to satisfy a few cosmetic constraints).

Relevant artifact sections: ArgRank, Obtaining DebateGPT

paulbricman commented 1 year ago

Relevant function for sanitization: https://github.com/paulbricman/DebateGPT/blob/relative-evaluation/debategpt/training/reward.py#L137

y-mx commented 1 year ago

Implementing the sanitization function caused the scores of many propositions to drop to zero, including many which are mostly well-formatted but contain extraneous punctuation such as colons and quotation marks. This results in the sum of the scores of the two parties adding to less than one, and in many cases both parties got a final score of zero. I am wondering if it would make sense to relax the sanitization requirements, for example by allowing more types of punctuation, then re-normalizing the scores.