haesleinhuepf / human-eval-bia

Benchmarking Large Language Models for Bio-Image Analysis Code Generation
MIT License
16 stars 4 forks source link

Add test case: Save image according to ome-ngff standards #73

Open ClementCaporal opened 1 week ago

ClementCaporal commented 1 week ago

I don't know if this is a well-defined "biological image analysis code" task, but ensuring that images produced by LLM are saved in ome/ome-ngff format to be compatible with other scripts may be something important?

As the LLM might not have a lot of training with this format, describing it at the beginning inside the prompt might be necessary? So it can at least put the right dimension order

haesleinhuepf commented 1 week ago

Hi @ClementCaporal ,

awesome idea! We already have a test case for loading zarr files: https://github.com/haesleinhuepf/human-eval-bia/blob/7e9712670168b71bea44e17e7389dfc120dcc96e/test_cases/open_zarr.ipynb

Do you think you could formulate a similar task for saving ngff files?

Thanks for the suggestion!

ClementCaporal commented 1 day ago

Hello,

Sorry for the delay, I am discovering how frustrating it can be to create a good prompt for a "simple" task. I am still working on a clever way to "ask" for a good ngff files...

Meanwhile I did this notebook that just check if it is able to save a zarr file: https://gist.github.com/ClementCaporal/02a9401877c2b8b49fe6edbca5816962 The check is permissive between saving a group or directly the array in the zarr. (Llama tends to save it with a name and GPT-3.5 directly as an array)

haesleinhuepf commented 21 hours ago

Hi @ClementCaporal ,

awesome, the test-case looks good to me! Consider sending a pull-request. You can also later add a more complicated test-case for zarr files.

And please do not make any decisions on how to change the test case depending on how individual LLMs solve the test. Ideallly you write a test-case without promoting any LLM. Otherwise you would bias the test 😉