add test-case for tiled image processing - Githubissues

haesleinhuepf / human-eval-bia

Benchmarking Large Language Models for Bio-Image Analysis Code Generation

MIT License

19 stars 11 forks source link

add test-case for tiled image processing #27

Closed haesleinhuepf closed 5 months ago

haesleinhuepf commented 5 months ago

This PR contains:

[x] a new test-case for the benchmark
- [x] I hereby confirm that NO LLM-based technology (such as github copilot) was used while writing this benchmark
[ ] new generator-functions allowing to sample from other LLMs
[ ] new samples (sample_....jsonl files)
[ ] new benchmarking results (..._results.jsonl files)
[ ] documentation update
[ ] bug fixes

Related github issue (if relevant): closes #23

Short description:

This adds a new test-case for processing images in tiles. I'm using dask for this (which is a new dependency for this project), but it could certainly be done without.
Code is partially copy-pasted from here: https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/32_tiled_image_processing/tiling_images_naive_approach.html

How do you think will this influence the benchmark results?

I have not tested this, but I presume this test-case is a hard one. I could imagine that currently no LLM can solve it. This might decrease pass-rates for all LLMs.

Why do you think it makes sense to merge this PR?

Tiled image processing is a common task in bio-image analysis. It makes sense to include this in our benchmark.