Fixes needed to make the gsm8k benchmark run against Azure OpenAI GPT-4 chat model.
Also contains some additional small changes:
Removes the global prompt list in the gsm8k module, making it a scoped variable instead. We'll be doing something similar for the other scripts that followed this pattern.
Makes generations/ directory which will be used to store the intermediate outputs moving forward
Fixes needed to make the gsm8k benchmark run against Azure OpenAI GPT-4 chat model.
Also contains some additional small changes:
generations/
directory which will be used to store the intermediate outputs moving forward