nvtransfer / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
Apache License 2.0
646 stars 43 forks source link

hope add qwen2-7b-chat result #46

Closed Chandler-Bing closed 2 months ago

Chandler-Bing commented 2 months ago

thanks for the great project, hope add official qwen2-7b-chat result,to compair with glm-4-96-chat. according to qwen tech report: qwen2 is much better than glm4 on long context evaluation , but I doubt it...

hsiehjackson commented 2 months ago

According to Table 12 in Qwen2 tech report, if we just evaluate vanilla Qwen2-7b-128K and GLM4-9b-1M, then GLM4 may get better results. However, Qwen2 also proposes Yarn+DCA (training free extension) to boost their performance. I didn't find the codes to run inference with these techniques, so unfortunately I don't have results for comparisons.

Chandler-Bing commented 2 months ago

thanks,