thunlp / InfLLM

The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"
MIT License
269 stars 21 forks source link

关于longbench测试问题。 #39

Closed tiaotiaosong closed 4 months ago

tiaotiaosong commented 4 months ago
image

使用原始配置运行longbench.sh,测试结果如上所示,和论文中的差距较大,不知是否正常?

config=config/mistral-inf-llm.yaml

datasets="narrativeqa,qasper,multifieldqa_en,\ hotpotqa,2wikimqa,musique,\ gov_report,qmsum,multi_news,\ trec,triviaqa,samsum,\ passage_count,passage_retrieval_en,\ lcc,repobench-p"

mkdir benchmark/longbench-result

python3 benchmark/pred.py \ --config_path ${config} \ --output_dir_path benchmark/longbench-result \ --datasets ${datasets}

python3 benchmark/eval.py --dir_path benchmark/longbench-result

tiaotiaosong commented 4 months ago

因为本次数据集是从modelscope下载的,估计是数据有变动导致的。