haesleinhuepf / human-eval-bia

Benchmarking Large Language Models for Bio-Image Analysis Code Generation
MIT License
19 stars 11 forks source link

Add claude 3.5 sonnet #69

Closed haesleinhuepf closed 3 months ago

haesleinhuepf commented 3 months ago

This PR contains:

Short description:

How do you think will this influence the benchmark results?

image

Why do you think it makes sense to merge this PR?

Before merging this, we need to update thet paper text though,

haesleinhuepf commented 3 months ago

This test costed around $3.,00 (can't tell more precisely)