Closed Katehuuh closed 1 month ago
This is likely an instruct template issue. Not sure how the TGW loader works with templates, but probably you can fix it by modifying the config.json to only list token 128009 under "eos_token_id".
This is likely an instruct template issue. Not sure how the TGW loader works with templates, but probably you can fix it by modifying the config.json to only list token 128009 under "eos_token_id".
It work, Thanks. Should this be a templates loader issue for oobabooga then?
@turboderp by remove eos_token_id
: 128008
and 128009
, will it cause further issues?
It's really up to the frontend to specify what the stop conditions are, as part of the instruct template. But because HF has a very confused format, these conflicts occur every now and again. ExLlama has a single token which is used as a default stop condition, so it doesn't really know what to do with models that decided they wanted multiple stop tokens all of a sudden. The frontend can still set as many stop conditions as it likes, though, to suit whatever instruct format it decides to use.
Switching to 128009 works for Llama3 specifically because that token marks the end of model responses in the L3 instruct template, and some frontends assume that a) model responses are supposed to end with EOS and b) models only define a single EOS token. Making matters a little more complicated, L3 had some errors in its config when it first launched (defining <|end_of_text|>
in tokenizer_config.json instead of <|eot_id|>
etc., and even as the models have been updated those changes aren't always reflected in the many quantized versions already on HF.
But to be clear, the eos_token_id value in config.json is only really used as a default, and it's more of a suggestion. Changing it shouldn't break anything.
OS
Windows
GPU Library
CUDA 12.x
Python version
3.10
Pytorch version
3.10.8
Model
turboderp/Llama-3.1-8B-Instruct-exl2
Describe the bug
I always receive
assistant
at the end of each sentence.The bug only occurs with Llama-3.1 family. The following models were tested:
I am using oobabooga/text-gen, and the issue only occurs in chat-instruct mode (the
chat
mode works correctly).This also happens with the
Llama-v3
and Instruction template. The model loaderExLlamav2_HF
andTransformers
work correctly.Reproduction steps
To clarify, using the official meta-llama/Meta-Llama-3.1-8B-Instruct with ExLlamav2 as the loader causes the issue.
Expected behavior
.
Logs
No response
Additional context
No response
Acknowledgements