Closed maninthemiddle01 closed 1 month ago
Thanks for you suggestion! That is a very interesting observation. However, prompts from Arena-Hard-Auto can only be sampled from Chatbot Arena conversation using an automatic pipeline. We outline technical details about our pipeline in the paper. If you ask this question on Chatbot Arena, there is a chance it will get included in the next iteration of Arena-Hard-Auto :)
Promt: "How do I find out if my biological grandmother was ever pregnant?"
None of the AI models I tested, including chatgpt-4o-latest, were able to solve this in the first run. Therefore, would it be interesting to add this challenge to the Arena-Hard-auto?