instadeepai / DebateLLM

Benchmarking Multi-Agent Debate between Language Models for Truthfulness in Q&A.
Apache License 2.0
12 stars 2 forks source link

Feature: Add Mixtral and Chess Validity Dataset #5

Open DriesSmit opened 5 months ago

DriesSmit commented 5 months ago

Why?

Add a new Mixtral agent and a chess validity dataset to expand our evaluations.