nuprl / MultiPL-E

A multi-programming language benchmark for LLMs
https://nuprl.github.io/MultiPL-E/
Other
201 stars 38 forks source link

Perl Unit test when expecting "False/0" output #66

Open PootieT opened 1 year ago

PootieT commented 1 year ago

I have a feeling that this may have been debated but testing boolean values in Perl may need improvement

Example: HumanEval_92_any_int

sub any_int {
    my($x, $y, $z) = @_;
    # some perl program that returns 0/1
}
use Test::Deep;

sub testhumaneval {
    my $candidate = \&any_int;
        if(eq_deeply($candidate->(2, 3, 1),1)) {
        print "ok!" }else{
        exit 1;
        }
        if(eq_deeply($candidate->(2.5, 2, 3),"")) {
        print "ok!" }else{
        exit 1;
        }
        if(eq_deeply($candidate->(1.5, 5, 3.5),"")) {
        print "ok!" }else{
        exit 1;
        }
        if(eq_deeply($candidate->(2, 6, 2),"")) {
        print "ok!" }else{
        exit 1;
        }
        if(eq_deeply($candidate->(4, 2, 2),1)) {
        print "ok!" }else{
        exit 1;
        }
        if(eq_deeply($candidate->(2.2, 2.2, 2.2),"")) {
        print "ok!" }else{
        exit 1;
        }
        if(eq_deeply($candidate->(-4, 6, 2),1)) {
        print "ok!" }else{
        exit 1;
        }
        if(eq_deeply($candidate->(2, 1, 1),1)) {
        print "ok!" }else{
        exit 1;
        }
        if(eq_deeply($candidate->(3, 4, 7),1)) {
        print "ok!" }else{
        exit 1;
        }
        if(eq_deeply($candidate->(3.0, 4, 7),"")) {
        print "ok!" }else{
        exit 1;
        }
}

testhumaneval();

It seems like at the moment, when the program is expected to output False, it is being compared against "" with eq_deeply. Many generations in perl, though, return 0/1. But the following comparison between 0 and "" seem to evaluate to False

eq_deeply(0, "") # -> False

Maybe, one solution is to directly use the output of these functions as the condition for the if statement for that unit test (only when output is expected to be boolean)

if($candidate->(2, 3, 1)) {   #expect True
        print "ok!" }else{
        exit 1;
        }
if(!$candidate->(2.5, 2, 3)) {   #expect False
        print "ok!" }else{
        exit 1;
        }
arjunguha commented 1 year ago

@mgree / @mhyee any opinions on this one?

mgree commented 1 year ago

Probably better to generate custom equality checking code when the expected output is "" rather than munging the test case translation itself.

arjunguha commented 1 year ago

What type of error is this:

Here is an argument for the latter: this is what the Test::Deep library does. If it's a reasonable testing library, we should just use its notions of equality.

mhyee commented 1 year ago

I think the comparison should be relaxed. It looks like "numeric false" (0 or "0") and "string false" ("") aren't equal but should be.

A scalar value is interpreted as FALSE in the Boolean sense if it is undefined, the null string or the number 0 (or its string equivalent, "0"), and TRUE if it is anything else.

https://perldoc.perl.org/perldata#Scalar-values

Some answers on StackOverflow recommended !!0 for false (but others discourage it for being obscure symbols), but I tried it and Test::Deep considers it different from 0 (but equal to "").

mgree commented 1 year ago

It seems like Perl's == does the right thing:

$ perl -e 'print((0== "") . "\n"); print(!!0 == "") ."\n"); print((1 == "") . "\n")'
1
1

Maybe we can say $got == $expected || eq_deeply($got, $expected)?