Open Werve opened 5 months ago
I bet this was not in the field of priority yet. May be added, I can try to write that change but I will only have time in 5-6 days or so. I will subscribe to this issue though.
It works for me with the v1/completions endpoint if I set logprobs: 10 in the request, at least with the exl2 backend.
It works for me with the v1/completions endpoint if I set logprobs: 10 in the request, at least with the exl2 backend.
Thank you for the feedback, so I will try again in the near future if anything has changed since my last attempt.
I tried loading a gguf template via llama.cpp and used the /docs
page created by text-generation-webui for testing with the OpenAI compatible APIs.
For example, by sending the following request to /v1/completions
:
{
"model": "string",
"prompt": "string",
"best_of": 1,
"use_samplers": false,
"echo": false,
"top_logits": 50,
"frequency_penalty": 0,
"logit_bias": {},
"logprobs": 50,
"max_tokens": 16,
"n": 1,
"presence_penalty": 0,
"stop": [
"string"
],
"stream": false,
"suffix": "string",
"temperature": 1,
"top_p": 1,
"user": "string",
"preset": "string",
"min_p": 0,
"dynamic_temperature": false,
"dynatemp_low": 1,
"dynatemp_high": 1,
"dynatemp_exponent": 1,
"smoothing_factor": 0,
"smoothing_curve": 1,
"top_k": 0,
"repetition_penalty": 1,
"repetition_penalty_range": 1024,
"typical_p": 1,
"tfs": 1,
"top_a": 0,
"epsilon_cutoff": 0,
"eta_cutoff": 0,
"guidance_scale": 1,
"negative_prompt": "",
"penalty_alpha": 0,
"mirostat_mode": 0,
"mirostat_tau": 5,
"mirostat_eta": 0.1,
"temperature_last": false,
"do_sample": true,
"seed": -1,
"encoder_repetition_penalty": 1,
"no_repeat_ngram_size": 0,
"truncation_length": 0,
"max_tokens_second": 0,
"prompt_lookup_num_tokens": 0,
"custom_token_bans": "",
"sampler_priority": [
"string"
],
"auto_max_new_tokens": false,
"ban_eos_token": false,
"add_bos_token": true,
"skip_special_tokens": true,
"grammar_string": ""
}
Returns:
{
"id": "conv-1716307415075573504",
"object": "text_completion",
"created": 1716307415,
"model": "zephyr-7b-beta.Q5_K_M.gguf",
"choices": [
{
"index": 0,
"finish_reason": "length",
"text": " = \"Python is awesome\"\n\n# Find the first vowelstring",
"logprobs": {
"top_logprobs": [
{}
]
}
}
],
"usage": {
"prompt_tokens": 2,
"completion_tokens": 18,
"total_tokens": 20
}
}
As can be seen, there are no logprobs data shown.
If instead you use /v1/internal/logits
for example by sending:
{
"prompt": "string",
"use_samplers": false,
"top_logits": 50,
"frequency_penalty": 0,
"max_tokens": 16,
"presence_penalty": 0,
"temperature": 1,
"top_p": 1,
"preset": "string",
"min_p": 0,
"dynamic_temperature": false,
"dynatemp_low": 1,
"dynatemp_high": 1,
"dynatemp_exponent": 1,
"smoothing_factor": 0,
"smoothing_curve": 1,
"top_k": 0,
"repetition_penalty": 1,
"repetition_penalty_range": 1024,
"typical_p": 1,
"tfs": 1,
"top_a": 0,
"epsilon_cutoff": 0,
"eta_cutoff": 0,
"guidance_scale": 1,
"negative_prompt": "",
"penalty_alpha": 0,
"mirostat_mode": 0,
"mirostat_tau": 5,
"mirostat_eta": 0.1,
"temperature_last": false,
"do_sample": true,
"seed": -1,
"encoder_repetition_penalty": 1,
"no_repeat_ngram_size": 0,
"truncation_length": 0,
"max_tokens_second": 0,
"prompt_lookup_num_tokens": 0,
"custom_token_bans": "",
"sampler_priority": [
"string"
],
"auto_max_new_tokens": false,
"ban_eos_token": false,
"add_bos_token": true,
"skip_special_tokens": true,
"grammar_string": ""
}
Logprobs returns correctly:
{
"1": 0.03324248269200325,
"2": 0.001473770011216402,
" =": 0.33165204524993896,
"(": 0.23287194967269897,
"_": 0.034331582486629486,
"[]": 0.02858273684978485,
" input": 0.026553742587566376,
" longest": 0.01292374636977911,
"=": 0.010019432753324509,
" reverse": 0.009669930674135685,
" Solution": 0.008004358969628811,
" DL": 0.0075729419477283955,
".": 0.00756840780377388,
" find": 0.006876722909510136,
" solution": 0.005255056545138359,
"=\"": 0.005123800598084927,
"\n": 0.004228157922625542,
" s": 0.004070453345775604,
" remove": 0.0033274691086262465,
"[": 0.0030798325315117836,
" sort": 0.0029597424436360598,
" ": 0.0028184654656797647,
" name": 0.0027846412267535925,
"ify": 0.0027198356110602617,
"y": 0.002703306032344699,
"?": 0.0025417900178581476,
" trim": 0.0022532783914357424,
" replace": 0.002217961475253105,
",": 0.0021920190192759037,
" get": 0.0021445895545184612,
" message": 0.002104737563058734,
" read": 0.0016813671682029963,
"To": 0.001632209517993033,
" solve": 0.00146298308391124,
" user": 0.0013596850913017988,
" str": 0.0013504669768735766,
" a": 0.0013481411151587963,
":": 0.001335840206593275,
"(\"": 0.0012714873300865293,
" first": 0.0012692000018432736,
" is": 0.001227685483172536,
" Find": 0.0011690461542457342,
" format": 0.0011656478745862842,
" my": 0.0011427566641941667,
" lower": 0.0011346976971253753,
" pal": 0.001128942472860217,
"iest": 0.0010990109294652939,
"()": 0.0010859015164896846,
" add": 0.001075023552402854,
" check": 0.0010711746290326118
}
So I think the lm-evaluation-harness framework does not work for evaluations that require logprobs such as mmlu since it expects to also read logprobs along with the generated response.
Since there are now so many models on HF and it would be useful to understand how they perform on specific tasks or languages.
Lately I was trying to use https://github.com/EleutherAI/lm-evaluation-harness/tree/main aiming to test quantized models as well.
But it seems that the OpenAI API of text-generation-webui does not return logprobs using "/v1/completions" , the relevant field is always empty.
Am I wrong or is this still not possible?
For the same model using "/v1/internal/logits" seems to return values.