Can you add our new models Llama3-PBM-Nova-70B to the leaderboard?
Llama3-PBM-Nova-70B has been developed using meticulously designed SFT and RLHF techniques, building on the Meta-Llama-3-70B model. The evaluation results on open-source benchmarks are provided below:
Evaluation:
Model
Arena-Hard
MixEval-Hard
Alpaca-Eval 2.0
GPT-4Turbo(04/09)
82.6%
62.6
55.0%
GPT-4o(05/13)
79.2%
64.7
57.5%
Gemini 1.5 Pro
72.0%
58.3
-
Llama3-PBM-Nova-70B
74.5%
58.1
61.23%
Llama-3.1-70B-Instruct
55.7%
-
38.1%
Llama-3-70B-Instruct
46.6%
55.9
34.4%
Compared to the current state-of-the-art models, Llama3-PBM-Nova-70B has achieved top-tier performance among open-source models and can even rival or surpass the performance of some closed-source models.
Can you add our new models Llama3-PBM-Nova-70B to the leaderboard?
Llama3-PBM-Nova-70B has been developed using meticulously designed SFT and RLHF techniques, building on the Meta-Llama-3-70B model. The evaluation results on open-source benchmarks are provided below:
Compared to the current state-of-the-art models, Llama3-PBM-Nova-70B has achieved top-tier performance among open-source models and can even rival or surpass the performance of some closed-source models.
Our result is attached here.