关于longbench测试问题。

使用原始配置运行longbench.sh，测试结果如上所示，和论文中的差距较大，不知是否正常？

config=config/mistral-inf-llm.yaml

datasets="narrativeqa,qasper,multifieldqa_en,\ hotpotqa,2wikimqa,musique,\ gov_report,qmsum,multi_news,\ trec,triviaqa,samsum,\ passage_count,passage_retrieval_en,\ lcc,repobench-p"

mkdir benchmark/longbench-result

python3 benchmark/pred.py \ --config_path ${config} \ --output_dir_path benchmark/longbench-result \ --datasets ${datasets}

python3 benchmark/eval.py --dir_path benchmark/longbench-result

thunlp / InfLLM

关于longbench测试问题。 #39