WikiWhy is a new benchmark for evaluating LLMs' ability to explain between cause-effect relationships. It is a QA dataset containing 9000+ "why" question-answer-rationale triplets.
Hi, thanks for reaching out! We are currently working on a version 1.2 of the dataset, and are considering releasing the human evaluations at the same time. Thank you for your patience.
Hi, Thanks for sharing your great work!
Would you also publicly release the human evaluation outcomes?
Best,