Open junhwi opened 1 month ago
JudgeBench: A Benchmark for Evaluating LLM-based Judges
https://arxiv.org/abs/2410.12784
OpenAI Swarm https://github.com/openai/swarm
Thinking LLMs: General Instruction Following with Thought Generation https://arxiv.org/abs//2410.10630v1
JudgeBench: A Benchmark for Evaluating LLM-based Judges
https://arxiv.org/abs/2410.12784