the-crypt-keeper / can-ai-code

Self-evaluating interview for AI coders
https://huggingface.co/spaces/mike-ravkine/can-ai-code-results
MIT License
491 stars 27 forks source link

Guide on how to evaluate models #180

Open kisimoff opened 3 months ago

kisimoff commented 3 months ago

Im willing to test a few models and share the results. I've looked at the readme, but couldn't wrap my head around how to benchmark a model. Any help would be appriciated!

the-crypt-keeper commented 3 months ago

The docs definitely need a rewrite my apologies here.

The general flow is:

1) prepare.py 2) interview*.py 3) eval.py

In the dark days we had to deal with dozens of prompt formats, but these days prepare.py can be run with --chat hfmodel and it will sort it out.

Note that there are two interviews junior-v2 and senior, I usually only run senior on strong models that get >90% on junior.