count assert statements in our test-case notebooks - Githubissues

haesleinhuepf / human-eval-bia

Benchmarking Large Language Models for Bio-Image Analysis Code Generation

MIT License

13 stars 4 forks source link

count assert statements in our test-case notebooks #32

Open haesleinhuepf opened 2 months ago

haesleinhuepf commented 2 months ago

This PR contains:

[ ] a new test-case for the benchmark
- [ ] I hereby confirm that NO LLM-based technology (such as github copilot) was used while writing this benchmark
[ ] new generator-functions allowing to sample from other LLMs
[ ] new samples (sample_....jsonl files)
[ ] new benchmarking results (..._results.jsonl files)
[x] documentation update
[ ] bug fixes

Related github issue (if relevant): closes #0

Short description:

This delivers a way for estimating the number of unit-tests per test-case. We're counting the actual assert statements which test our functions.

How do you think will this influence the benchmark results?

It adds another metric that tells us something about the quantity of the tests.

Why do you think it makes sense to merge this PR?

I'm not sure if we need it. The original HumanEval paper mentions approx 7.7 unit tests per test-case. We have approx 2.5 assert statements per test-case... I'm not sure if these are comparable...

haesleinhuepf commented 2 months ago

I'm not 100% sure if this is the way to go. Don't merge this for now.