haesleinhuepf / human-eval-bia

Benchmarking Large Language Models for Bio-Image Analysis Code Generation
MIT License
19 stars 11 forks source link

Ideas for evals #1

Open royerloic opened 6 months ago

royerloic commented 6 months ago

List of tasks, these were initially drawn for Omega, but can be adapted as functions for the purpose of this work:  

royerloic commented 6 months ago

More ideas here: https://github.com/royerlab/napari-chatgpt/blob/17644864cd4c343368f574844ce1ec8cdcb5497c/manuscript/SuppTable1_Example_widgets.pdf

haesleinhuepf commented 6 months ago

Great ideas, big thanks @royerloic ! I took the opportunity to turn your list into a checkbox-list. Some are ticked already because I implemented similar use-cases.

haesleinhuepf commented 6 months ago

A complete list of implemented test-cases can be found here: https://github.com/haesleinhuepf/human-eval-bia/blob/main/test_cases/readme.md

(and this list is updated semi-automatically)