Closed andrewssobral closed 5 months ago
First iteration of reproducibility score works as expected. I am approving this merge.
Future tasks (tagging you @andrewssobral and @carolinepacheco):
Change the score scale from quantitative (1-10) to qualitative, following the schema below:
How it works for now:
It was tested with the following providers: OpenAI, Gemini, Ollama and Groq.
README contents of the above test: