openai / human-eval

Code for the paper "Evaluating Large Language Models Trained on Code"
MIT License
2.43k stars 349 forks source link

Task 145 makes no sense #43

Open geajack opened 6 months ago

geajack commented 6 months ago

The prompt, canonical solution and tests for task 145 are:

def order_by_points(nums):
    """
    Write a function which sorts the given list of integers
    in ascending order according to the sum of their digits.
    Note: if there are several items with similar sum of their digits,
    order them based on their index in original list.

    For example:
    >>> order_by_points([1, 11, -1, -11, -12]) == [-1, -11, 1, -12, 11]
    >>> order_by_points([]) == []
    """
    def digits_sum(n):
        neg = 1
        if n < 0: n, neg = -1 * n, -1 
        n = [int(i) for i in str(n)]
        n[0] = n[0] * neg
        return sum(n)

    return sorted(nums, key=digits_sum)

def check(candidate):
    # Check some simple cases
    assert candidate([1, 11, -1, -11, -12]) == [-1, -11, 1, -12, 11]
    assert candidate([1234,423,463,145,2,423,423,53,6,37,3457,3,56,0,46]) == [0, 2, 3, 6, 53, 423, 423, 423, 1234, 145, 37, 46, 56, 463, 3457]
    assert candidate([]) == []
    assert candidate([1, -11, -32, 43, 54, -98, 2, -3]) == [-3, -32, -98, -11, 1, 2, 43, 54]
    assert candidate([1,2,3,4,5,6,7,8,9,10,11]) == [1, 10, 2, 11, 3, 4, 5, 6, 7, 8, 9]
    assert candidate([0,6,6,-76,-21,23,4]) == [-76, -21, 0, 4, 23, 6, 6]

    # Check some edge cases that are easy to work out by hand.
    assert True, "This prints if this assert fails 2 (also good for debugging!)"

This makes no sense for negative inputs. For example, look at the first test:

assert candidate([1, 11, -1, -11, -12]) == [-1, -11, 1, -12, 11]

One reasonable interpretation for "digit sum" for a negative integer would be just the digit sum of the absolute value. Another would be the negative of the digit sum of the absolute value. But neither of those rules seem to be used here. Instead, if we look at the canonical solution, the interpretation seems to be that we should think of the minus sign as applying only to the first digit in the number, so a number like -186 breaks down as (-1, 8, 6) and has a digit sum of 13.

Theoretically it would be possible to infer this rule from the example given in the prompt, but it's still extremely difficult to guess the intended meaning here, even for a human. This may be a deliberate design choice, but I think it's worth flagging here so people can at least be aware of it.