Closed ruiAzevedo19 closed 2 days ago
Please try this out with the cheapest model from openrouter and post the CSV here to see how it looks like.
@bauersimon These are the results
eval-dev-quality evaluate --runs 1 --repository golang/plain --model openrouter/meta-llama/llama-3-8b-instruct
Awesome. The cost in the log is kinda useless but it should be higher for more expensive models anyways. Just need to remember to scale them up for our evaluations then.
Part of #210