nvtransfer / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
Apache License 2.0
646 stars 43 forks source link

A mistral long context - MegaBeam-Mistral-512K #48

Closed chenwuperth closed 2 months ago

chenwuperth commented 2 months ago

Hi, thanks for the project! could you please evaluate https://huggingface.co/aws-prototyping/MegaBeam-Mistral-7B-512k on the latest RULER benchmark. Thanks!

hsiehjackson commented 2 months ago

Sure! I put the results on the leaderboard (under our evaluation) although I saw you have tested on your own. This is a pretty good long-context model. It would be great if we can have numbers to show its short context performance (MMLU, MTBench, or something on open llm leaderboard).

chenwuperth commented 2 months ago

Thank you for testing it! Yes, I just wanted to confirm if our eval is consistent with yours (which appears to be the case). I will take a look at the short context benchmark although we have focused solely on the "long" context when training this model.