Closed siddhartha-gadgil closed 1 year ago
Two ways one can test performance:
We are working with a partial binary port of mathlib. So all examples should be such that they work with this, and ideally there are related prompts in clean_prompts.json
Stale, not clear what this means in terms of working now.
Current status
There are currently three files in this repository that contain together 120 test statements:
There is also a lean program that can run with various configurations to see how many of these elaborate. After the setup following
README.md
an example run is:This attempts to elaborate all statements in
silly-prompts.txt
with 10 example prompts based on sentence simliarity, 4 based on keywords with 15 Codex completions for each statement at temperature 0.8.