zhudotexe / fanoutqa

Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language Models (ACL 2024)
https://fanoutqa.com/
MIT License
35 stars 2 forks source link

Single-Answer Questions? #10

Open nbalepur opened 4 weeks ago

nbalepur commented 4 weeks ago

All of the examples shown in the paper imply that the questions have a single answer, but actually most of the questions are just several QA pairs combined into one (i.e. need to provide several answers to be correct).

Is this right? I don't find many examples like "What is the total number of employees in the five largest banks in the world?" that require synthesizing multiple sub-answers into a final answer

zhudotexe commented 4 weeks ago

Yes, that's right - not all of the questions are single-answer (in fact, I think most of them are not). In Appendix A of our paper most of our provided examples are questions whose answers are key-value pairs.

nbalepur commented 4 weeks ago

I wish this was clearer in the paper---Figure 1, Sec 4.2, and the use of "top-level answer" made me believe that the goal was to synthesize subanswers into a single answer (versus just string concatenation). If you're curious, only 24/334 of the devset questions have a single answer. Maybe this could be made clearer in the repo?

Thanks anyway, the work is interesting and it would be cool to have a version that does require combining and reasining over these subanswers into a single answer

zhudotexe commented 4 weeks ago

Gotcha. The main thing we wanted to focus on was the fan-out operation itself and how that requires multi-turn retrieval (in the Open Book setting at least); in another current work we're using this dataset to compare single-agent vs multi-agent systems which influenced some of the wording there. So there is indeed a stronger focus on the fan-out part rather than the fan-in/aggregation task (IMO, in a multi-agent system where in theory each subquery is handled by an individual agent, this reduces to a fairly standard reading comprehension/reasoning task ala GSM8K etc). This lets us isolate the challenges in the task easier but I agree that it would be more challenging if each question also involved an aggregation step.