hsiehjackson / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
Apache License 2.0
319 stars 17 forks source link

gpt-4o results? #12

Open the21st opened 1 month ago

the21st commented 1 month ago

Would love to see results for gpt-4o. There was some claimed improvement in its abilities: http://nian.llmonpy.ai/

hsiehjackson commented 1 month ago

We also plan to run evaluation for gpt-4o! Looks like gpt-4o has large improvement to solve lost-in-the-middle issue.