the-crypt-keeper / can-ai-code

Self-evaluating interview for AI coders
https://huggingface.co/spaces/mike-ravkine/can-ai-code-results
MIT License
524 stars 30 forks source link

Too few questions for senior category #193

Open rodion-m opened 4 months ago

rodion-m commented 4 months ago

Hi! Thanks for a useful benchmark. The separation of quants is great.

Am I right in thinking that we currently have only one question in the senior category? I'm afraid it might not be representative.

rodion-m commented 4 months ago

Also, if the question is only one, what does "Passed" column mean here? image

the-crypt-keeper commented 4 months ago

@rodion-m There's 3 questions in total, the meat of this test is in vm.yaml which asks the LLM to build an assembler. The score is based on how well the answer performs on the Checks.. the model can get up to weight points for each check and partial marks are possible: for example if 3 of 8 returned list entries are correct the model will still get 3 points, otherwise the scores would all be too low.