OpenAI_Compatible (llamacpp server): tiktoken caught in a loop

its-ven commented 5 months ago

Describe the issue as clearly as possible:

I'm using the api_like_OAI.py script from the llamacpp repo, which works fine with the official OAI python library. The code below even correctly calls the server:

Adding a print statement in the tokenizer function returns an unending loop, regardless of model name, including official ones like gpt-4 and gpt-3.5-turbo:

As an additional test, I attempted to use the default OpenAI model by setting: os.environ["OPENAI_BASE_URL"] = "http://localhost:8081" which just returns a connection error and no activity from the server.

Steps/code to reproduce the bug:

import outlines

model = outlines.models.OpenAICompatibleAPI(model_name="none", api_key="none", base_url="http://localhost:8081", encoding="gpt-4") #tiktoken.get_encoding(name) does not work

prompt = """You are a sentiment-labelling assistant.
Is the following review positive or negative?

Review: This restaurant is just awesome!
"""

generator = outlines.generate.choice(model, ["Positive", "Negative"])
answer = generator(prompt)
print(answer)

Expected result:

"Positive"

Error message:

See above

Outlines/Python version information:

Outlines version: 0.0.25 Python version: 3.10.6

Context for the issue:

No response

lapp0 commented 5 months ago

I couldn't reproduce. Could you consider trying a tighter integration via https://github.com/outlines-dev/outlines/blob/main/docs/reference/models/llamacpp.md

its-ven commented 4 months ago

I couldn't reproduce. Could you consider trying a tighter integration via https://github.com/outlines-dev/outlines/blob/main/docs/reference/models/llamacpp.md

I've already tried but the llamacpp library is much slower than running an OAI proxy. I'm launching the server via this batch file:

start /B python oai_api.py --llama-api http://localhost:8080
start /B server --mlock -ngl 35 -m mistral-7b-instruct-v0.2.Q5_K_M.gguf -c 4096
pause

outlines-dev / outlines