Closed mgoin closed 6 months ago
Found through lm-evaluation-harness since for the OpenAI Completion interface they make requests using raw token_ids
OAI docs reference: https://platform.openai.com/docs/api-reference/completions
prompt: The prompt(s) to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays.
Example request during gsm8k evaluation:
gsm8k
{'model': 'hf:mgoin/llama-2-7b-gsm8k-pruned60-quant-ds', 'prompt': [[1, 894, 29901, 21828, 18577, 29871, 29929, 29900, 9814, 273, 21425, 322, 29871, 29946, 29900, 28145, 5697, 348, 3173, 393, 9814, 273, 21425, 29889, 1128, 1784, 18281, 947, 540, 8024, 3001, 29973, 13, 22550, 29901]], 'max_tokens': 256, 'stop': ['<|endoftext|>'], 'seed': 1234, 'temperature': 0.0}
Server command:
deepsparse.server --integration openai --task text-generation --model_path hf:mgoin/llama-2-7b-gsm8k-pruned60-quant-ds
Eval command (using this branch https://github.com/EleutherAI/lm-evaluation-harness/pull/1277):
lm_eval --model local-completions --model_args base_url=http://localhost:5543/v1,model=hf:mgoin/llama-2-7b-gsm8k-pruned60-quant-ds,tokenizer_backend=huggingface,tokenizer=mgoin/llama-2-7b-gsm8k-pruned60-quant-ds --tasks gsm8k --num_fewshot 0
please add a test for this input case
very good call, missed that file
Found through lm-evaluation-harness since for the OpenAI Completion interface they make requests using raw token_ids
OAI docs reference: https://platform.openai.com/docs/api-reference/completions
Example request during
gsm8k
evaluation:Server command:
Eval command (using this branch https://github.com/EleutherAI/lm-evaluation-harness/pull/1277):