haesleinhuepf / human-eval-bia

Benchmarking Large Language Models for Bio-Image Analysis Code Generation
MIT License
19 stars 11 forks source link

Remove small labels: Unclear prompt #111

Open tischi opened 1 week ago

tischi commented 1 week ago

Hi @haesleinhuepf,

Here the prompt is:

Takes a label_image and removes all objects that are smaller than a given size_threshold.

While this may just be precise enough I am not sure whether we should be a bit more specific.

I just tried with chatGTP the following:

Q:
write a python function that takes an image as an input and computes the sum of all pixels
^D

A:

  Certainly! Below is a Python function that takes an image as input and
  computes the sum of all its pixel values. This function uses the  PIL
  (Pillow) library to handle image operations.

    from PIL import Image
    import numpy as np

    def sum_of_pixels(image_path):
        # Open the image using PIL
        image = Image.open(image_path)

        # Convert the image to a numpy array
        image_array = np.array(image)

        # Compute the sum of all pixel values
        total_sum = np.sum(image_array)

        return total_sum

    # Example usage
    image_path = 'path_to_your_image.jpg'
    total_sum = sum_of_pixels(image_path)
    print("Sum of all pixels:", total_sum)

So, saying that the "input is an image" could be either interpreted as it being a path to an image or a numpy array (and maybe even something else).

I am wondering whether we should make our prompts more specific, like:

haesleinhuepf commented 1 week ago

Interesting! That's an issue I barely hit in practice; but I can confirm your observation. Chatgpt produced pretty much the same code on my side.

That means we have to update many test cases...

How about doing this on a separate branch, and comparing the performance on a single model ( old versus new docstrings ) just for documentation purposes?

tischi commented 1 week ago

How about doing this on a separate branch, and comparing the performance on a single model ( old versus new docstrings ) just for documentation purposes?

Sounds good to me! But, who does it? Do you think one of the other contributors could be motivated?


And the devil's in the detail here:

Q:
Write a python function that takes an intensity_image as an input and returns the sum of all pixels.
^D

A:

  Certainly! Below is a Python function that takes an intensity image
  (assumed to be in the form of a 2D NumPy array) as an input and returns the
  sum of all its pixel values:

    import numpy as np

    def sum_of_pixels(intensity_image):
        """
        Calculate the sum of all pixel values in the intensity image.

        Parameters:
        intensity_image (numpy.ndarray): A 2D array representing the
  intensity image.

        Returns:
        int/float: The sum of all pixel values.
        """
        return np.sum(intensity_image)

Instead of image I wrote intensity_image, which apparently, and I think rightly so, suggests to the LLM that this is a variable that already contains image data.

This could also one way, to recommend using prompts with meaningful variable names. for example:

Takes an image_path as an input and returns image_width, image_height.

Or

Takes an image_numpy_array as an input and returns image_width, image_height.

Basically, whenever needed specify the type in the variable name.

I am not sure what's best practice here...

haesleinhuepf commented 1 week ago

Basically, whenever needed specify the type in the variable name.

I would still try to have the text close to how humans speak. "An numpy image" or "A numpy-array image" seems ok. I'm also using label_image in quite some cases where I thought it's ok too. This seems a bit too extreme to me: "Takes an image_numpy_array as an input and returns image_width, image_height."

haesleinhuepf commented 1 week ago

Sounds good to me! But, who does it?

I can do it. It takes longer to explain this to someone than doing it.

tischi commented 1 week ago

All right, so would you then, for consistency, change label_image to "numpy label image"?

haesleinhuepf commented 1 week ago

Yes, or something like "label image as numpy array" to be a bit more human-readable