nuprl / MultiPL-E

A multi-programming language benchmark for LLMs
https://nuprl.github.io/MultiPL-E/
Other
200 stars 38 forks source link

Add Clojure programming language #136

Closed puredanger closed 6 months ago

puredanger commented 7 months ago

Hello, I'm a member of the Clojure core team, and we've been able to successfully use the MultiPL-E benchmark to translate and evaluate Clojure programs with the changes in this PR (well, to the extent that models generate useful Clojure code!). Please let me know if you have any feedback, would love to see it added to your project. Thanks!

arjunguha commented 7 months ago

Will add! My goal is to put out another release this month, and we'll make sure Clojure is part of it. I'm curious -- have you tried to eval any particular models with it? How did they do?

puredanger commented 7 months ago

We have tried several models with it - they are mostly not great, I think due to the relative lack of training data in many of the open source model training sets. The Stack data set excludes EPL code, and most open source Clojure code is EPL, so I think that has affected this a lot. Hopefully this is an area where we can make some improvements.

cassanof commented 6 months ago

Hi @puredanger, thanks a lot for the PR, we are working on getting it merged! I have generated StarCoder2 completions on the HumanEval Clojure dataset. StarCoder2 is a new model that has trained on a bunch of Clojure projects (383,128 Clojure files). Evaluating the completions it with the container, the Clojure compiler throws an exception on every sample:

Error building classpath. Failed to read artifact descriptor for org.clojure:clojure:jar:1.11.2
org.eclipse.aether.resolution.ArtifactDescriptorException: Failed to read artifact descriptor for org.clojure:clojure:jar:1.11.2
        at org.apache.maven.repository.internal.DefaultArtifactDescriptorReader.loadPom(DefaultArtifactDescriptorReader.java:255)
        at org.apache.maven.repository.internal.DefaultArtifactDescriptorReader.readArtifactDescriptor(DefaultArtifactDescriptorReader.java:171)
        at org.eclipse.aether.internal.impl.DefaultRepositorySystem.readArtifactDescriptor(DefaultRepositorySystem.java:263)
        at clojure.tools.deps.extensions.maven$read_descriptor.invokeStatic(maven.clj:115)
        at clojure.tools.deps.extensions.maven$fn__1320.invokeStatic(maven.clj:143)
        at clojure.tools.deps.extensions.maven$fn__1320.invoke(maven.clj:143)
        at clojure.lang.MultiFn.invoke(MultiFn.java:244)
        at clojure.tools.deps$expand_deps$children_task__928$fn__930$fn__931.invoke(deps.clj:407)
        at clojure.lang.AFn.applyToHelper(AFn.java:152

Do you know what this may be?

Here are the completions (along with the evaluations, ending in .results.json.gz): clojure-comps.tar.gz

Thanks, Federico

puredanger commented 6 months ago

This is from the Clojure CLI trying to download the Clojure implementation jar (which it should only need to do the first time it runs). The error is usually because a) it does not have the ability to reach the url (at this point it should be trying to reach https://repo1.maven.org/maven2/org/clojure/clojure/1.11.2/clojure-1.11.2.pom), or it can't write to ~/.m2/repository where it caches those files (this is all standard Maven/JVM ecosystem stuff).

Perhaps this download should be triggered in the Dockerfile so it doesn't happen during evaluation? I think it would just be another instruction in the Dockerfile at the end of the section that will trigger the downloads but not do anything else:

RUN clojure -P

If the problem is the cache directory write access, there are other solutions there too but maybe the above would be sufficient.

cassanof commented 6 months ago

Now the evaluator works! Thanks for your input.

I also changed the way the prompts are being generated. I added indentation to the docstring and argument list, and also added a final indentation to force the model's generation toward finishing the function. I did two-space indentation; is that the canonical style for Clojure?

I reran StarCoder2-15b and got the following result: 23.85 pass@1 Here are the completions: clojure-comps-reran.tar.gz

If you'd like to see how this compares to other languages, page 24 of the StarCoder2 paper includes a MultiPL-E evaluation on all 18 languages we had in the original paper: https://arxiv.org/pdf/2402.19173.pdf For instance, Racket gets 22.4 pass@1.

Overall, this is a pretty good result. If you'd like to improve on this number, I suggest you read our MultiPL-T paper, where we use the MultiPL-E translators to generate fine-tuning sets for these languages: https://arxiv.org/abs/2308.09895

arjunguha commented 6 months ago

agreed, we can probably push this number higher with MultiPL-T at some point.

arjunguha commented 3 months ago

@puredanger I'm trying to put out a new release of MultiPL-E with Clojure. I have two problems:

  1. Lots of timeouts (which I assume I can just fix by giving it more time)
  2. These errors:
[a.guha@d0130 humaneval-clj-starcoder2_15b-0.2-reworded]$ zgrep "cannot create" *.results.json.gz
HumanEval_0_has_close_elements.results.json.gz:      "stderr": "cp: cannot create regular file '/home/a.guha/.clojure/tools/tools.edn': File exists\n",
HumanEval_0_has_close_elements.results.json.gz:      "stderr": "cp: cannot create regular file '/home/a.guha/.clojure/deps.edn': File exists\n",
HumanEval_0_has_close_elements.results.json.gz:      "stderr": "cp: cannot create regular file '/home/a.guha/.clojure/deps.edn': File exists\n",
HumanEval_0_has_close_elements.results.json.gz:      "stderr": "cp: cannot create regular file '/home/a.guha/.clojure/deps.edn': File exists\n",
HumanEval_0_has_close_elements.results.json.gz:      "stderr": "cp: cannot create regular file '/home/a.guha/.clojure/deps.edn': File exists\n",
HumanEval_0_has_close_elements.results.json.gz:      "stderr": "cp: cannot create regular file '/home/a.guha/.clojure/tools/tools.edn': File exists\n",
HumanEval_0_has_close_elements.results.json.gz:      "stderr": "cp: cannot create regular file '/home/a.guha/.clojure/tools/tools.edn': File exists\n",

MultiPL-E runs several tests concurrently, and I'm on a 48 core machine. I'm going to make a guess: this is Clojure trying to download part of its toolchain, which you and @cassanof were trying to address earlier in this thread. I see that clojure -P was added to the Dockerfile earlier, but I guess that wasn't enough:

https://github.com/nuprl/MultiPL-E/blob/main/evaluation/Dockerfile#L73

Any ideas?

arjunguha commented 3 months ago

Figured it out. FYI:

I'm in a slurm-based cluster environment where compute nodes require an HTTP/S proxy. The system environment variables are setup for most programs to use them. But, I need to create a ~/.m2/settings.xml file for Maven. (Maven seems to ignore JAVA_OPTS.)

Thanks to Claude for teaching me about this.

puredanger commented 3 months ago

Yep, that sounds right