nvtransfer / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
Apache License 2.0
646 stars 43 forks source link

Detailed scores of Phi-3-mini-128k #71

Closed huangyuxiang03 closed 1 week ago

huangyuxiang03 commented 1 week ago

Hi, Thanks for your great work on evaluating long context ability of LLMs! Also enjoyed your poster presentation at COLM 2024. Could you provide the raw scores of phi3-mini-128k? It seems like this model is not included in the original paper. Thanks.

hsiehjackson commented 1 week ago
@huangyuxiang03 here are the full results. niah_single_1 niah_single_2 niah_single_3 niah_multikey_1 niah_multikey_2 niah_multikey_3 niah_multivalue niah_multiquery vt fwe cwe qa_1 qa_2 avg
4k 100.0 100.0 100.0 96.6 99.6 99.6 92.2 98.5 97.1 85.1 91.9 83.8 54.2 92.2
8k 100.0 100.0 100.0 96.6 100.0 99.4 95.2 98.5 99.3 83.4 88.6 75.8 53.0 91.5
16k 100.0 100.0 100.0 95.4 100.0 98.6 92.8 98.0 99.3 86.7 78.7 78.8 50.2 90.7
32k 100.0 100.0 100.0 94.6 99.6 97.4 84.4 96.8 99.0 86.5 52.3 75.6 50.8 87.5
64k 100.0 100.0 100.0 95.0 98.6 95.2 88.8 93.4 87.0 66.4 3.3 74.2 45.4 80.6
128k 98.6 97.8 97.8 86.4 65.2 42.0 66.4 69.1 55.0 84.1 1.8 66.6 36.2 66.7
huangyuxiang03 commented 1 week ago

Thank you for providing these results. I have no futher questions and I'm closing this issue.