roboflow / gpt-checkup

Monitor the performance of OpenAI's GPT-4V model over time.
https://www.gptcheckup.com
27 stars 5 forks source link

Restructure the tests into a folder + add tests #9

Closed stellasphere closed 9 months ago

stellasphere commented 9 months ago

Description

Shifting to a folder structure for the tests to be housed in. The motivation was two-fold, to organize the main web.py file which was growing in length and complexity with every test added, as well as the added benefit of making the website dynamic in terms of adding a visual front-end for each of the tests.

Also added:

Type of change

How has this change been tested, please provide a testcase or example of how you tested the change?

Tested locally, generating the results/2023-11-26.json, results/2023-11-27.json and results/2023-11-28.json successfully

Any specific deployment considerations

Significant deployment considerations in terms of how future tests are added.

Docs

capjamesg commented 9 months ago

This is a big improvement over how our tests are organized right now. Good job!

stellasphere commented 9 months ago

I can try modifying the prompts, but I don't understand what you mean when you say it limits our ability to programmatically evaluate them. I extract JSON components using a markdown code block regex so they're still evaluated properly.