Feature: Add Mixtral and Chess Validity Dataset

instadeepai / DebateLLM

Benchmarking Multi-Agent Debate between Language Models for Truthfulness in Q&A.

Apache License 2.0

12 stars 2 forks source link

Open DriesSmit opened 5 months ago

DriesSmit commented 5 months ago

Add a new Mixtral agent and a chess validity dataset to expand our evaluations.