Open kisimoff opened 3 months ago
The docs definitely need a rewrite my apologies here.
The general flow is:
1) prepare.py 2) interview*.py 3) eval.py
In the dark days we had to deal with dozens of prompt formats, but these days prepare.py can be run with --chat hfmodel and it will sort it out.
Note that there are two interviews junior-v2 and senior, I usually only run senior on strong models that get >90% on junior.
Im willing to test a few models and share the results. I've looked at the readme, but couldn't wrap my head around how to benchmark a model. Any help would be appriciated!