Closed madhavatreplit closed 1 year ago
We want to support evaluation for ReplitLM models with the bigcode-evaluation-harness.
We add an evaluation/ directory with code to run evaluation with the harness:
evaluation/
evaluation/README.md
evaluation/eval.py
main.py
evaluation/scripts
eval.py
Ran and reproduced numbers.
Why
We want to support evaluation for ReplitLM models with the bigcode-evaluation-harness.
What changed
We add an
evaluation/
directory with code to run evaluation with the harness:evaluation/README.md
: instructions and steps on how to setup and run evaluationevaluation/eval.py
: a fork of the original harnessmain.py
with patches for tokenizer decoding and flash attentionevaluation/scripts
: bash scripts to help parameterize and runeval.py
.Test plan
Ran and reproduced numbers.
Rollout