Closed rombodawg closed 4 months ago
@rombodawg I gave it a spin this morning, leaderboard should be updated in a few minutes. There's a fairly sizable discrepancy on this one at FP16 when run with transformers (better) vs vllm (~50% worse, relatively speaking) 🤔 Add this to the fact that every single 8B Llama3 code finetune I've evaluated so far has performed significantly worse then Meta's Llama3-Instruct I am growing suspicious that L3 8B finetunes aren't actually working properly across the board.
Hey @the-crypt-keeper I made a bigger version of that Replete-Coder model. This one should perform alot better. I think you'd want to add this to your leaderboard.