haesleinhuepf / human-eval-bia

Benchmarking Large Language Models for Bio-Image Analysis Code Generation
MIT License
19 stars 11 forks source link

Samples from recent open source models. #62

Closed jkh1 closed 5 months ago

jkh1 commented 5 months ago

This PR contains:

Related github issue (if relevant): closes #0

Short description:

How do you think will this influence the benchmark results?

Why do you think it makes sense to merge this PR?

haesleinhuepf commented 5 months ago

Awesome, thanks @jkh1 !!

I'm merging this into a dev branch to make sure we have evaluation, updated plots and tex-updates in one branch before merging in to main