nuprl / MultiPL-E

A multi-programming language benchmark for LLMs
https://nuprl.github.io/MultiPL-E/
Other
200 stars 38 forks source link

MultiPL-E 3.0 release #145

Closed arjunguha closed 3 months ago

arjunguha commented 3 months ago
arjunguha commented 3 months ago

For the record, I'm testing DeepSeekCoderv2-Lite and SC2-15B for the release:

base) [a.guha@d0130 data]$ column -s, -t passk.csv 
Dataset                                                  Pass@k  Estimate  NumProblems  MinCompletions  MaxCompletions
humaneval-clj-deepseekcoder_v2lite_base-0.2-reworded     1       0.16      161          50              50
humaneval-clj-starcoder2_15b-0.2-reworded                1       0.16      161          50              50
humaneval-cpp-deepseekcoder_v2lite_base-0.2-reworded     1       0.46      161          50              50
humaneval-cpp-starcoder2_15b-0.2-reworded                1       0.47      161          50              50
humaneval-cs-deepseekcoder_v2lite_base-0.2-reworded      1       0.23      158          50              50
humaneval-cs-starcoder2_15b-0.2-reworded                 1       0.32      158          50              50
humaneval-d-deepseekcoder_v2lite_base-0.2-reworded       1       0.16      156          50              50
humaneval-d-starcoder2_15b-0.2-reworded                  1       0.25      156          50              50
humaneval-elixir-deepseekcoder_v2lite_base-0.2-reworded  1       0.00      161          50              50
humaneval-elixir-starcoder2_15b-0.2-reworded             1       0.00      161          50              50
humaneval-go-deepseekcoder_v2lite_base-0.2-reworded      1       0.28      154          50              50
humaneval-go-starcoder2_15b-0.2-reworded                 1       0.26      154          50              50
humaneval-hs-deepseekcoder_v2lite_base-0.2-reworded      1       0.20      156          50              50
humaneval-hs-starcoder2_15b-0.2-reworded                 1       0.17      156          50              50
humaneval-java-deepseekcoder_v2lite_base-0.2-reworded    1       0.36      158          50              50
humaneval-java-starcoder2_15b-0.2-reworded               1       0.40      158          50              50
humaneval-jl-deepseekcoder_v2lite_base-0.2-reworded      1       0.33      159          50              50
humaneval-jl-starcoder2_15b-0.2-reworded                 1       0.32      159          50              50
humaneval-js-deepseekcoder_v2lite_base-0.2-reworded      1       0.45      161          50              50
humaneval-js-starcoder2_15b-0.2-reworded                 1       0.45      161          50              50
humaneval-lua-deepseekcoder_v2lite_base-0.2-reworded     1       0.38      161          50              50
humaneval-lua-starcoder2_15b-0.2-reworded                1       0.44      161          50              50
humaneval-ml-deepseekcoder_v2lite_base-0.2-reworded      1       0.17      155          50              50
humaneval-ml-starcoder2_15b-0.2-reworded                 1       0.24      155          50              50
humaneval-php-deepseekcoder_v2lite_base-0.2-reworded     1       0.43      161          50              50
humaneval-php-starcoder2_15b-0.2-reworded                1       0.34      161          50              50
humaneval-pl-deepseekcoder_v2lite_base-0.2-reworded      1       0.37      161          50              50
humaneval-pl-starcoder2_15b-0.2-reworded                 1       0.38      161          50              50
humaneval-rb-deepseekcoder_v2lite_base-0.2-reworded      1       0.40      161          50              50
humaneval-rb-starcoder2_15b-0.2-reworded                 1       0.42      161          50              50
humaneval-r-deepseekcoder_v2lite_base-0.2-reworded       1       0.31      161          50              50
humaneval-rkt-deepseekcoder_v2lite_base-0.2-reworded     1       0.22      161          50              50
humaneval-rkt-starcoder2_15b-0.2-reworded                1       0.26      161          50              50
humaneval-rs-deepseekcoder_v2lite_base-0.2-reworded      1       0.38      156          50              50
humaneval-rs-starcoder2_15b-0.2-reworded                 1       0.39      156          50              50
humaneval-r-starcoder2_15b-0.2-reworded                  1       0.25      161          50              50
humaneval-scala-deepseekcoder_v2lite_base-0.2-reworded   1       0.39      160          50              50
humaneval-scala-starcoder2_15b-0.2-reworded              1       0.41      160          50              50
humaneval-sh-deepseekcoder_v2lite_base-0.2-reworded      1       0.19      158          50              50
humaneval-sh-starcoder2_15b-0.2-reworded                 1       0.19      158          50              50
humaneval-swift-deepseekcoder_v2lite_base-0.2-reworded   1       0.36      158          50              50
humaneval-swift-starcoder2_15b-0.2-reworded              1       0.34      158          50              50
humaneval-ts-deepseekcoder_v2lite_base-0.2-reworded      1       0.47      159          50              50
humaneval-ts-starcoder2_15b-0.2-reworded                 1       0.43      159          50              50
arjunguha commented 3 months ago

I am not able to get Clojure to work in a Singularity container with the Discovery HTTPS proxy. Getting it to work probably isn't worth it -- most people won't be using Singularity and HTTPS proxies. Anyway, for Discovery, I have a Conda environment conda activate /work/arjunguha-research-group/arjun/conda/clojure that should "just work". Assuming you create a file ~/.m2/settings.xml with this content:

<settings>
  <proxies>
    <proxy>
      <id>example-proxy</id>
      <active>true</active>
      <protocol>https</protocol>
      <host>10.99.0.130 </host>
      <port>3128</port>
      <nonProxyHosts>localhost|127.0.0.1</nonProxyHosts>
    </proxy>
  </proxies>
</settings>