haesleinhuepf / human-eval-bia

Benchmarking Large Language Models for Bio-Image Analysis Code Generation
MIT License
19 stars 11 forks source link

add notebook to summarize common failure reasons #51

Closed haesleinhuepf closed 5 months ago

haesleinhuepf commented 5 months ago

This PR contains:

Related github issue (if relevant): related to #17

Short description:

How do you think will this influence the benchmark results?

Why do you think it makes sense to merge this PR?

@tischi This might be interesting for you as you aimed in that direction as discussed in #17