Closed jelber2 closed 10 months ago
Need to figure out the easiest way to install Zig and what version to use for the container.
Here is a shell script that builds Zig, could certainly be adapted to build in the container:
#!/bin/bash
ZIG_TARBALL="zig-linux-x86_64-0.12.0-dev.167+dd6a9caea.tar.xz"
ZIG_DIR="/zig"
wget https://ziglang.org/builds/$ZIG_TARBALL && \
mkdir -p $ZIG_DIR && \
tar xf $ZIG_TARBALL -C $ZIG_DIR --strip-components 1
You would then need to either install the /zig/zig
executable somewhere in the PATH, or update PATH to point to /zig
.
Where I found the tarball: https://ziglang.org/download/
Thanks @JohnGouwar . I had been using https://github.com/tristanisham/zvm , which is convenient for specific versions, but the above you cite is fine - just need to point to whatever you want. One big problem with Zig is that is not stable yet, and there have been many breaking changes that perhaps is more of an issue depending on the LLM's training data cutoff.
@andrewrk what version of Zig would you recommend writing HumanEval for given that CodeLlama and Starcoder training data likely cutoff around Zig 0.8.0 - Zig 0.9.0
@jelber2 Glad to help. My intuition with the tarball approach is that it's probably easier to do non-interactively in a container (though I don't have any personal experience with zvm
so that intuition may be inaccurate). I'm not familiar with Zig, but what you could try is to generate ~20 completions with starcoderbase-1b
on translated Zig HumanEval prompts, run evaluation on multiple versions and see if you get different results (i.e. the same program fails in one version, but not another) to see which version seems best to use in the container. I imagine that the language features tested by HumanEval and MBPP should be relatively stable.
Had to make a small merge for ongoing work, just fixed small import conflicts.
@jelber2 Glad to help. My intuition with the tarball approach is that it's probably easier to do non-interactively in a container (though I don't have any personal experience with
zvm
so that intuition may be inaccurate). I'm not familiar with Zig, but what you could try is to generate ~20 completions withstarcoderbase-1b
on translated Zig HumanEval prompts, run evaluation on multiple versions and see if you get different results (i.e. the same program fails in one version, but not another) to see which version seems best to use in the container. I imagine that the language features tested by HumanEval and MBPP should be relatively stable.
Ok, I'll give this a try when I have a Block of time.
Need to do some extensive work on human_eval_to_zig.py. I generated tests with starcoderbase-3b, and they failed using the evaluation container. I just modified the code from humaneval_to_cpp.py
for the types in Ziglang without really looking at the json outputs until now. Getting humaneval_to_zig.py
to work will take some time.
I have pretty much given up on this as I do not have the time to read into the python code for generating the prompts.
Was able to use dataset_builder/all_prepare_prompts.py , to make prompts for Ziglang. Tests will probably fail as I need to figure out proper python to Zig test conversion in dataset_builder/humaneval_to_zig.py and correct format for dataset_builder/terms.csv. Had to convert jsonl to json for automodel.py to work, but it seems that might be taken care of in commit # 0adb7a42f95996e4000c31dfb5e48cd4ac571762