Closed Katehuuh closed 1 hour ago
This is likely an instruct template issue. Not sure how the TGW loader works with templates, but probably you can fix it by modifying the config.json to only list token 128009 under "eos_token_id".
This is likely an instruct template issue. Not sure how the TGW loader works with templates, but probably you can fix it by modifying the config.json to only list token 128009 under "eos_token_id".
It work, Thanks. Should this be a templates loader issue for oobabooga then?
@turboderp by remove eos_token_id
: 128008
and 128009
, will it cause further issues?
OS
Windows
GPU Library
CUDA 12.x
Python version
3.10
Pytorch version
3.10.8
Model
turboderp/Llama-3.1-8B-Instruct-exl2
Describe the bug
I always receive
assistant
at the end of each sentence.The bug only occurs with Llama-3.1 family. The following models were tested:
I am using oobabooga/text-gen, and the issue only occurs in chat-instruct mode (the
chat
mode works correctly).This also happens with the
Llama-v3
and Instruction template. The model loaderExLlamav2_HF
andTransformers
work correctly.Reproduction steps
To clarify, using the official meta-llama/Meta-Llama-3.1-8B-Instruct with ExLlamav2 as the loader causes the issue.
Expected behavior
.
Logs
No response
Additional context
No response
Acknowledgements