nuprl / MultiPL-E

A multi-programming language benchmark for LLMs
https://nuprl.github.io/MultiPL-E/
Other
201 stars 38 forks source link

Scala tests comparing optional value #64

Closed PootieT closed 1 year ago

PootieT commented 1 year ago

Example: HumanEval_90_next_smallest

Program that failed:

import scala.math._
import scala.collection.mutable._
object Problem {
    def nextSmallest(lst : List[Long]) : Option[Long] = {
        val sortedList = lst.sorted.distinct
        if (sortedList.length < 2) None else Some(sortedList(1))
    }
    def main(args: Array[String]) = {
    // Current test: test fail because `2l` is not the same as `Some(2l)`
    assert(nextSmallest((List[Long](5l.toLong, 1l.toLong, 4l.toLong, 3l.toLong, 2l.toLong))).equals(2l));
    // Modified test: comparing optional output with optional value
    assert(nextSmallest((List[Long](1l.toLong, 2l.toLong, 3l.toLong, 4l.toLong, 5l.toLong))).equals(Some(2l)));
    //assert(nextSmallest((List[Long]())).equals(None));
    //assert(nextSmallest((List[Long](1l.toLong, 1l.toLong))).equals(None));
    //assert(nextSmallest((List[Long](1l.toLong, 1l.toLong, 1l.toLong, 1l.toLong, 0l.toLong))).equals(1l));
    //assert(nextSmallest((List[Long](1l.toLong, 1l.toLong))).equals(None));
    //assert(nextSmallest((List[Long](-35l.toLong, 34l.toLong, 12l.toLong, -45l.toLong))).equals(-35l));
    }

}

There was a similar fix a few weeks ago for a different language but I can't remember which language that is..

arjunguha commented 1 year ago

@abhijangda I thought we had fixed this--maybe I misunderstood.

@PootieT you do have the latest prompts in the JSON files in this repo, right?

PootieT commented 1 year ago

I checked the main branch's prompt folder and yeah I still see the same error as my version

https://github.com/nuprl/MultiPL-E/blob/2804ca8ed06b1fb0808f84955c4c85f311bfc51b/prompts/humaneval-scala-reworded.json#LL1005C43-L1005C43

the last update to dataset_builder/humaneval_to_scala.py file also seem to be 3 months ago.

arjunguha commented 1 year ago

This is merged in.