Any full report or paper arxiv?

the-crypt-keeper / can-ai-code

Self-evaluating interview for AI coders

https://huggingface.co/spaces/mike-ravkine/can-ai-code-results

MIT License

513 stars 29 forks source link

Any full report or paper arxiv? #152

Closed zhimin-z closed 7 months ago

the-crypt-keeper commented 7 months ago

This is a perpetual project, the latest reports can be seen on the leaderboard page: https://huggingface.co/spaces/mike-ravkine/can-ai-code-results

No plans for a paper I'm afraid, I am a hobbyist with no academic affiliation.

zhimin-z commented 7 months ago

This is a perpetual project, the latest reports can be seen on the leaderboard page: https://huggingface.co/spaces/mike-ravkine/can-ai-code-results

No plans for a paper I'm afraid, I am a hobbyist with no academic affiliation.

So the leaderboard is basically the de facto report?

the-crypt-keeper commented 7 months ago

@zhimin-z Yes the leaderboard is the report.

Note there two interviews in there, "junior-v2" is the original simple one, which most modern models can easily pass, and "senior" is a new much more difficult one that most models are failing. I am still working on senior, need to add more tasks.

All the evaluation data is available inside the results/ folder of the repo if you'd like to dig deeper into why a particular model has the score it does.