haesleinhuepf / human-eval-bia

Benchmarking Large Language Models for Bio-Image Analysis Code Generation
MIT License
13 stars 4 forks source link

add bland-altman test case #31

Closed haesleinhuepf closed 2 months ago

haesleinhuepf commented 2 months ago

This PR contains:

Related github issue (if relevant): closes #0

Short description:

How do you think will this influence the benchmark results?

Why do you think it makes sense to merge this PR?