nuprl / MultiPL-E

A multi-programming language benchmark for LLMs
https://nuprl.github.io/MultiPL-E/
Other
201 stars 38 forks source link

Perl Unit test comparing float values #67

Closed PootieT closed 1 year ago

PootieT commented 1 year ago

Example test: HumanEval_71_triangle_area

sub testhumaneval {
    my $candidate = \&triangle_area;
        if(eq_deeply($candidate->(3, 4, 5),6.0)) {
        print "ok!" }else{
        exit 1;
        }
        if(eq_deeply($candidate->(1, 2, 10),-1)) {
        print "ok!" }else{
        exit 1;
        }
        if(eq_deeply($candidate->(4, 8, 5),8.18)) {
        print "ok!" }else{
        exit 1;
        }
        if(eq_deeply($candidate->(2, 2, 2),1.73)) {
        print "ok!" }else{
        exit 1;
        }
        if(eq_deeply($candidate->(1, 2, 3),-1)) {
        print "ok!" }else{
        exit 1;
        }
        if(eq_deeply($candidate->(10, 5, 7),16.25)) {
        print "ok!" }else{
        exit 1;
        }
        if(eq_deeply($candidate->(2, 6, 3),-1)) {
        print "ok!" }else{
        exit 1;
        }
        if(eq_deeply($candidate->(1, 1, 1),0.43)) {
        print "ok!" }else{
        exit 1;
        }
        if(eq_deeply($candidate->(2, 2, 10),-1)) {
        print "ok!" }else{
        exit 1;
        }
}

testhumaneval();

An output value of 6.00 would not pass the first test eq_deeply(6.00, 6.0). Seems like the same unit test library has this is_deeply_float function that maybe useful here? link

arjunguha commented 1 year ago

@mgree and @mhyee any opinions on this one? (We can save it until after the deadline this week.)

mgree commented 1 year ago

Every language suffers from the usual floating point equality issues. I don't see anything Perl specific here, though:

$ perl -e 'print(6.00 == 6.0); print("\n")'
1
arjunguha commented 1 year ago

I wonder if eq_deeply is somehow more pedantic.

mgree commented 1 year ago
$ perl -e 'use Test::Deep; print(eq_deeply(6.00, 6)); print("\n")'
1
$ perl -e 'use Test::Deep; print(eq_deeply(6.00, 6.0)); print("\n")'
1