Could you please opensource evaluation code

zuucan / NeedleInAHaystack-PLUS

To assess the longtext capabilities more comprehensively, we propose Needle-in-a-Haystack PLUS, which shifts the focus from simple fact retrieval to more challenging single-document/multi-document question answering tasks.

8 stars 0 forks source link

Could you please opensource evaluation code #2

Open KevinCL16 opened 2 months ago

KevinCL16 commented 2 months ago

Greetings! Great work on using opensource language model agents to beat GPT-4 on long context QA.

We have reproduced an agent framework based on the description given in your paper, but were unsure how to reproduce the results and visualization. Are you planning to opensource the evaluation code recently?

Best wishes.

w18731337090 commented 3 weeks ago

Through this paper, we also want to try to reproduce the agent framework, and would like to ask you how to reproduce the framework.Thank you very much！