dsikka commented 6 months ago

Summary

Now that we can return logits, add logprob support
This adds the LogProbs pydantic model, which can now be returned for the completions endpoint
This model includes the list of produced tokens as well as the list of logprob associated with each token
Also please pay attention to the logprob calculation

Testing / Question?

This was finicky to test because we dont output the generated token_ids, just the final sequence. So this ends up going through the steps of tokenizing the output to get the token_ids (as needed by the LogProbs) which seems redundant to do as we have them from the pipeline, but it is not part of the output --> we should make it as part of the output to clean this up if there are not strong opinions against it?
This would also make it easier to handle the streaming and non-streaming case, which have a nuanced differences when tokenizing individual tokens vs full sequences (which was very annoying)

Local Testing

num_cores: 2
num_workers: 2
integration: openai
endpoints:
  - task: text_generation
    model: "hf:mgoin/TinyStories-1M-ds"

Client Code:

import openai
from openai import OpenAI

client = OpenAI(base_url="http://localhost:5543/v1", api_key="EMPTY")

models = client.models.list()

model = "hf:mgoin/TinyStories-1M-ds"
print(f"Accessing model API '{model}'")

# Completion API
stream = True
completion = client.completions.create(
    prompt="The dog",
    max_tokens=10,
    stream=stream,
    model=model,
    logprobs=True
)

for c in completion:
    print(c)

Output:

Accessing model API 'hf:mgoin/TinyStories-1M-ds'
Completion(id='cmpl-67b04e6e35a84efd9950438d47c1794c', choices=[CompletionChoice(finish_reason=None, index=None, logprobs=Logprobs(text_offset=[], token_logprobs=[-1.5180178880691528], tokens=[' very'], top_logprobs=None), text=' very')], created=1705011569, model='/home/dsikka/.cache/huggingface/hub/models--mgoin--TinyStories-1M-ds/snapshots/eba4f1f16d00041d78c8f8a3bba5d3fa5afe5513/model.onnx', object='text_completion', system_fingerprint=None, usage=None)
Completion(id='cmpl-67b04e6e35a84efd9950438d47c1794c', choices=[CompletionChoice(finish_reason=None, index=None, logprobs=Logprobs(text_offset=[], token_logprobs=[-1.396243691444397], tokens=[' scared'], top_logprobs=None), text=' scared')], created=1705011569, model='/home/dsikka/.cache/huggingface/hub/models--mgoin--TinyStories-1M-ds/snapshots/eba4f1f16d00041d78c8f8a3bba5d3fa5afe5513/model.onnx', object='text_completion', system_fingerprint=None, usage=None)
Completion(id='cmpl-67b04e6e35a84efd9950438d47c1794c', choices=[CompletionChoice(finish_reason=None, index=None, logprobs=Logprobs(text_offset=[], token_logprobs=[-0.6142183542251587], tokens=['.'], top_logprobs=None), text='.')], created=1705011569, model='/home/dsikka/.cache/huggingface/hub/models--mgoin--TinyStories-1M-ds/snapshots/eba4f1f16d00041d78c8f8a3bba5d3fa5afe5513/model.onnx', object='text_completion', system_fingerprint=None, usage=None)
Completion(id='cmpl-67b04e6e35a84efd9950438d47c1794c', choices=[CompletionChoice(finish_reason=None, index=None, logprobs=Logprobs(text_offset=[], token_logprobs=[-0.7276340126991272], tokens=[' He'], top_logprobs=None), text=' He')], created=1705011569, model='/home/dsikka/.cache/huggingface/hub/models--mgoin--TinyStories-1M-ds/snapshots/eba4f1f16d00041d78c8f8a3bba5d3fa5afe5513/model.onnx', object='text_completion', system_fingerprint=None, usage=None)
Completion(id='cmpl-67b04e6e35a84efd9950438d47c1794c', choices=[CompletionChoice(finish_reason=None, index=None, logprobs=Logprobs(text_offset=[], token_logprobs=[-2.1691243648529053], tokens=[' wanted'], top_logprobs=None), text=' wanted')], created=1705011569, model='/home/dsikka/.cache/huggingface/hub/models--mgoin--TinyStories-1M-ds/snapshots/eba4f1f16d00041d78c8f8a3bba5d3fa5afe5513/model.onnx', object='text_completion', system_fingerprint=None, usage=None)
Completion(id='cmpl-67b04e6e35a84efd9950438d47c1794c', choices=[CompletionChoice(finish_reason=None, index=None, logprobs=Logprobs(text_offset=[], token_logprobs=[-0.06524639576673508], tokens=[' to'], top_logprobs=None), text=' to')], created=1705011569, model='/home/dsikka/.cache/huggingface/hub/models--mgoin--TinyStories-1M-ds/snapshots/eba4f1f16d00041d78c8f8a3bba5d3fa5afe5513/model.onnx', object='text_completion', system_fingerprint=None, usage=None)
Completion(id='cmpl-67b04e6e35a84efd9950438d47c1794c', choices=[CompletionChoice(finish_reason=None, index=None, logprobs=Logprobs(text_offset=[], token_logprobs=[-2.3006293773651123], tokens=[' go'], top_logprobs=None), text=' go')], created=1705011569, model='/home/dsikka/.cache/huggingface/hub/models--mgoin--TinyStories-1M-ds/snapshots/eba4f1f16d00041d78c8f8a3bba5d3fa5afe5513/model.onnx', object='text_completion', system_fingerprint=None, usage=None)
Completion(id='cmpl-67b04e6e35a84efd9950438d47c1794c', choices=[CompletionChoice(finish_reason=None, index=None, logprobs=Logprobs(text_offset=[], token_logprobs=[-1.681646466255188], tokens=[' back'], top_logprobs=None), text=' back')], created=1705011569, model='/home/dsikka/.cache/huggingface/hub/models--mgoin--TinyStories-1M-ds/snapshots/eba4f1f16d00041d78c8f8a3bba5d3fa5afe5513/model.onnx', object='text_completion', system_fingerprint=None, usage=None)
Completion(id='cmpl-67b04e6e35a84efd9950438d47c1794c', choices=[CompletionChoice(finish_reason=None, index=None, logprobs=Logprobs(text_offset=[], token_logprobs=[-0.7840849757194519], tokens=[' to'], top_logprobs=None), text=' to')], created=1705011569, model='/home/dsikka/.cache/huggingface/hub/models--mgoin--TinyStories-1M-ds/snapshots/eba4f1f16d00041d78c8f8a3bba5d3fa5afe5513/model.onnx', object='text_completion', system_fingerprint=None, usage=None)

mgoin commented 6 months ago

in between some concurrent work between deepsparse and lm-eval to test the openai server, i ran into an issue that top_logprobs isn’t implemented here is the vLLM create_logprobs function that i tried hacking into your diff but got stuck without having the top_logprobs input: https://github.com/vllm-project/vllm/blob/827cbcd37c464452b79956fa4a564199e6c0ab6a/vllm/entrypoints/openai/api_server.py#L208-L240C20

here is the request, response, and traceback i got within lm-eval-harness. commands:

deepsparse.server --integration openai --task text-generation --model_path hf:mgoin/TinyStories-1M-ds
lm_eval --model local-completions --model_args base_url=http://localhost:5543/v1,model=hf:mgoin/TinyStories-1M-ds,tokenizer_backend=huggingface,tokenizer=mgoin/TinyStories-1M-ds --tasks hellaswag --num_fewshot 0

REQUEST = {'model': 'hf:mgoin/TinyStories-1M-ds', 'prompt': [[41183, 290, 14620, 25, 1374, 284, 910, 23748, 287, 410, 1155, 22678, 13, 13816, 366, 2124, 259, 442, 24247, 78, 366, 355, 257, 2276, 31933, 13, 1002, 345, 691, 2193, 530, 410, 1155, 22678, 31933, 11, 366, 2124, 259, 442, 24247, 78, 366, 561, 1884, 307, 262, 1266, 31933, 284, 3853, 13, 350, 1313, 8652, 366, 2124, 259, 442, 24247, 78, 366, 355, 25, 7813, 474, 322, 262, 1573, 366, 442, 24247, 78, 366, 1724, 366, 23748, 366, 287, 46932, 11, 475, 345, 561, 8365, 779, 340, 3436, 13, 220, 17106, 11, 12581, 6428, 366, 627, 2634, 366, 1724, 366, 23748, 366, 287, 410, 1155, 22678, 11, 475, 345, 743, 635, 779, 340, 355, 366, 627, 2634, 13, 366, 329, 46932, 11636, 11, 340, 318, 16293, 366, 1575, 64, 2356, 64, 442, 24247, 78, 12, 421, 2634, 13]], 'echo': True, 'max_tokens': 0, 'temperature': 0.0, 'logprobs': 10, 'seed': 1234}

2024-01-12:21:52:50,368 INFO     [_client.py:1027] HTTP Request: POST http://localhost:5543/v1/completions "HTTP/1.1 200 OK"

RESPONSE = CompletionChoice(finish_reason='stop', index=None, logprobs=Logprobs(text_offset=[], token_logprobs=[-1.3718655109405518, -1.6813075542449951, -0.5835205912590027, -2.3857421875, -1.1850147247314453, -1.4327609539031982, -1.0862425565719604, -1.3265399932861328, -0.01401725597679615, -0.7671912908554077, -1.8408502340316772, -0.02035493403673172, -0.805853545665741, -1.3963234424591064, -0.6728101372718811, -0.28462377190589905, -0.18176312744617462, -1.6125370264053345, -0.8710795640945435, -1.1399714946746826, -0.4823414087295532, -1.6222418546676636, -1.6073075532913208, -0.9253568649291992, -0.28968146443367004, -2.509056568145752, -1.207929015159607, -1.9377094507217407, -0.39531823992729187, -0.5303062200546265, -1.6694589853286743, -0.24369390308856964, -1.382930040359497, -0.42184001207351685, -2.2233667373657227, -0.8875962495803833, -1.6202012300491333, -1.9755644798278809, -2.335638999938965, -1.3909152746200562, -2.493443489074707, -0.8141875267028809, -1.8022541999816895, -1.3976320028305054, -2.528510570526123, -0.4395048916339874, -0.8453584313392639, -0.008289567194879055, -0.012714286334812641, -1.3391444683074951, -0.009850701317191124, -1.1681102514266968, -1.4828898906707764, -1.0213128328323364, -0.8255200982093811, -1.1556187868118286, -2.5673587322235107, -0.07395735383033752, -2.212928295135498, -0.856827974319458, -1.0892446041107178, -0.6976301074028015, -0.41025859117507935, -2.5871167182922363, -0.6402055025100708, -1.857731580734253, -0.3513781726360321, -0.28716686367988586, -0.880348801612854, -2.105755090713501, -0.255537211894989, -2.720140218734741, -0.19674457609653473, -1.0769606828689575, -1.0204941034317017, -0.6762701869010925, -0.1618189811706543, -1.2297614812850952, -1.4207383394241333, -1.2299003601074219, -0.49282366037368774, -1.246532917022705, -1.6099696159362793, -2.2601776123046875, -1.465539813041687, -0.001021907082758844, -2.3649849891662598, -0.3639565706253052, -0.93106609582901, -0.9325934052467346, -1.520829439163208, -0.8304876089096069, -0.01390259712934494, -0.010546673089265823, -1.4068495035171509, -0.8410376310348511, -0.032891422510147095, -0.011724268086254597], tokens=[' You', ' can', "'t", ' take', ' it', ' away', '".', ' ', '\n', '\n', 'Sam', 'my', ' was', ' sad', ',', ' but', ' he', ' knew', ' he', ' had', ' to', ' be', ' careful', '.', ' He', ' said', ' "', 'No', ',', ' I', ' can', "'t", ' do', ' it', '.', ' I', ' will', ' be', ' careful', ' and', ' try', ' to', ' make', ' it', ' better', '."', ' ', '\n', '\n', 'Sam', 'my', ' was', ' very', ' sad', ' and', ' he', ' decided', ' to', ' take', ' a', ' break', '.', ' He', ' put', ' the', ' band', 'age', ' on', ' the', ' floor', ' and', ' put', ' it', ' in', ' his', ' pocket', '.', ' He', ' was', ' so', ' happy', ' and', ' he', ' was', ' able', ' to', ' play', ' with', ' the', ' band', '.', ' ', '\n', '\n', 'The', ' end', '.', '\n'], top_logprobs=None), text=' You can\'t take it away". \n\nSammy was sad, but he knew he had to be careful. He said "No, I can\'t do it. I will be careful and try to make it better." \n\nSammy was very sad and he decided to take a break. He put the bandage on the floor and put it in his pocket. He was so happy and he was able to play with the band. \n\nThe end.\n')

Traceback (most recent call last):
  File "/home/mgoin/venvs/clip-ret/bin/lm_eval", line 8, in <module>
    sys.exit(cli_evaluate())
  File "/home/mgoin/code/lm-evaluation-harness-mgoin/lm_eval/__main__.py", line 231, in cli_evaluate
    results = evaluator.simple_evaluate(
  File "/home/mgoin/code/lm-evaluation-harness-mgoin/lm_eval/utils.py", line 415, in _wrapper
    return fn(*args, **kwargs)
  File "/home/mgoin/code/lm-evaluation-harness-mgoin/lm_eval/evaluator.py", line 150, in simple_evaluate
    results = evaluate(
  File "/home/mgoin/code/lm-evaluation-harness-mgoin/lm_eval/utils.py", line 415, in _wrapper
    return fn(*args, **kwargs)
  File "/home/mgoin/code/lm-evaluation-harness-mgoin/lm_eval/evaluator.py", line 325, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
  File "/home/mgoin/code/lm-evaluation-harness-mgoin/lm_eval/models/openai_completions.py", line 201, in loglikelihood
    return self._loglikelihood_tokens(new_reqs)
  File "/home/mgoin/code/lm-evaluation-harness-mgoin/lm_eval/models/openai_completions.py", line 248, in _loglikelihood_tokens
    answer = get_result(resp, ctxlen)
  File "/home/mgoin/code/lm-evaluation-harness-mgoin/lm_eval/models/openai_completions.py", line 35, in get_result
    top_tokens = response.logprobs.top_logprobs[i]
TypeError: 'NoneType' object is not subscriptable

dsikka commented 6 months ago

This is ready for merge apart from support for the field top_logprob. Need clarification on what we're actually returning for this field

dsikka commented 6 months ago

@mgoin When you're happy with this and it passes any tests you have, let me know and we can merge this in. Also, if the tests you're running locally are worth adding to the openai_server tests, I can do that as well as part of this PR

neuralmagic / deepsparse

[OpenAI] Add logprob support #1522

Summary

Testing / Question?