lucemia / evals-llama

Evals is a framework for evaluating OpenAI models and an open-source registry of benchmarks.
MIT License
2 stars 1 forks source link

use evals to evaluate LLAMA #1

Open lucemia opened 1 year ago

lucemia commented 1 year ago

https://github.com/lucemia/evals-llama/blob/a7fe8e0ac5c4e2b71975bef5db10d73f64996b0f/evals/elsuite/basic/match.py#L28-L38

lucemia commented 1 year ago
> /Users/chienhsundavidchen/repo/evals-llama/evals/cli/oaieval.py(204)run()
-> result = eval.run(recorder)
(Pdb) eval
<evals.elsuite.basic.match.Match object at 0x13043fbe0>
(Pdb) eval.run
<bound method Match.run of <evals.elsuite.basic.match.Match object at 0x13043fbe0>>
(Pdb) eval_spec
EvalSpec(cls='evals.elsuite.basic.match:Match', args={'samples_jsonl': 'test_match/samples.jsonl'}, key='test-match.s1.simple-v0', group='test-basic')
(Pdb) args.eval
args = Namespace(model='gpt-3.5-turbo', eval='test-match', embedding_model='', ranking_model='', extra_eval_params='', max_samples=None, cache=True, visible=None, seed=20220722, user='', record_path=None, log_to_file=None, debug=False, local_run=True, dry_run=False, dry_run_logging=True)
lucemia commented 1 year ago
sample
{'input': [{'role': 'system', 'content': 'Complete the phrase as concisely as possible.'}, {'role': 'user', 'content': 'OpenAI was founded in 20'}], 'ideal': '15'}
(Pdb) self.model_spec
ModelSpec(name='gpt-3.5-turbo', model='gpt-3.5-turbo', is_chat=True, encoding=None, organization=None, api_key=None, extra_options={}, headers={}, strip_completion=True, n_ctx=4096, format=None, key=None, group=None)