the-crypt-keeper / can-ai-code

Self-evaluating interview for AI coders
https://huggingface.co/spaces/mike-ravkine/can-ai-code-results
MIT License
524 stars 30 forks source link

Evaluate Replete-AI/Replete-Coder-Llama3-8B #208

Closed rombodawg closed 3 months ago

rombodawg commented 3 months ago

Hey @the-crypt-keeper I made a bigger version of that Replete-Coder model. This one should perform alot better. I think you'd want to add this to your leaderboard.

the-crypt-keeper commented 3 months ago

@rombodawg I gave it a spin this morning, leaderboard should be updated in a few minutes. There's a fairly sizable discrepancy on this one at FP16 when run with transformers (better) vs vllm (~50% worse, relatively speaking) 🤔 Add this to the fact that every single 8B Llama3 code finetune I've evaluated so far has performed significantly worse then Meta's Llama3-Instruct I am growing suspicious that L3 8B finetunes aren't actually working properly across the board.