To assess the longtext capabilities more comprehensively, we propose Needle-in-a-Haystack PLUS, which shifts the focus from simple fact retrieval to more challenging single-document/multi-document question answering tasks.
Greetings! Great work on using opensource language model agents to beat GPT-4 on long context QA.
We have reproduced an agent framework based on the description given in your paper, but were unsure how to reproduce the results and visualization. Are you planning to opensource the evaluation code recently?
Through this paper, we also want to try to reproduce the agent framework, and would like to ask you how to reproduce the framework.Thank you very much!
Greetings! Great work on using opensource language model agents to beat GPT-4 on long context QA.
We have reproduced an agent framework based on the description given in your paper, but were unsure how to reproduce the results and visualization. Are you planning to opensource the evaluation code recently?
Best wishes.