Closed josephrocca closed 1 month ago
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
Not stale
Hi @josephrocca thank you for opening this issues. I'm attempting to reproduce on main but am having some trouble.
TGI started with
docker run --shm-size=1gb --gpus all \
-v /nvme0n1/Models/:/data \
-e HUGGINGFACE_HUB_CACHE=/data \
-e MODEL_ID=mistralai/Mistral-7B-Instruct-v0.1 \
-e PORT=8080 \
-p 3000:8080 \
ghcr.io/huggingface/text-generation-inference:2.2.0
Completions call with and without stop
value
from openai import OpenAI
import os
client = OpenAI(
base_url="http://localhost:3000/v1",
api_key=os.getenv("HF_TOKEN", "YOUR_API_KEY"),
)
SEED = 1337
PROMPT = "What are three words that describe the Python programming language?"
MODEL = "mistralai/Mistral-7B-Instruct-v0.1"
def without_stop():
chat_completion = client.completions.create(
model=MODEL,
prompt=PROMPT,
max_tokens=20,
stream=False,
seed=SEED,
)
generated = chat_completion.choices[0]
print(generated)
def with_stop():
chat_completion = client.completions.create(
model=MODEL,
prompt=PROMPT,
max_tokens=20,
stream=False,
seed=SEED,
stop=["-"],
)
generated = chat_completion.choices[0]
print(generated)
without_stop()
with_stop()
response
Object-oriented, Versatile, Easy to learn.
Object-
Would you kindly try the latest build? If the issue persists would you be able to share a reproduction script? Thank you!
closing as just retested with meta-llama-3.1-8B-Instruct
locally and stop is working as expected
from openai import OpenAI
import os
client = OpenAI(
base_url="http://localhost:3000/v1",
api_key=os.getenv("HF_TOKEN", "YOUR_API_KEY"),
)
SEED = 1337
PROMPT = "What are three words that describe the Python programming language?"
MODEL = "meta-llama/Meta-Llama-3.1-8B-Instruct"
def without_stop():
chat_completion = client.completions.create(
model=MODEL,
prompt=PROMPT,
max_tokens=10,
stream=False,
seed=SEED,
)
generated = chat_completion.choices[0]
print("without_stop\t", generated.text)
def with_stop():
chat_completion = client.completions.create(
model=MODEL,
prompt=PROMPT,
max_tokens=10,
stream=False,
seed=SEED,
stop=[","],
)
generated = chat_completion.choices[0]
print("with_stop\t", generated.text)
without_stop()
with_stop()
# OUTPUT
# without_stop (Pick one, two or three that you feel
# with_stop (Pick one,
System Info
Official docker image, v2.0.4
Information
Tasks
Reproduction
Try using the
stop
param for thev1/completions
endpoint. Note that this bug report is not related to this issue:This bug report is about the
stop
param not working at all for the/v1/completions
endpoint.Expected behavior
It should work - currently it doesn't work at all. Ideally the same as OpenAI's behavior (i.e. no token boundary limitations - see above linked issue), but if that's not possible, then like
/generate
.