Closed aslanxie closed 3 weeks ago
From llama3, the bos/eos token id are changed, for example Llama-3.1-8B-Instruct:
"bos_token_id": 128000,
"eos_token_id": [
128001,
128008,
128009
],
In text-generation example, it force model.generation_config.pad_token_id = 0
, and token id 0 represents '!' in meta-llama/Llama-3.1-8B-Instruct tokenizer table. So, it looks like token id mismatch.
@aslanxie This should have been fixed by https://github.com/huggingface/optimum-habana/pull/1444 that I just merged into main. Can you try again on the main branch and let me know if that works on your side too?
@regisss It's working on v1.14.0 now.
System Info
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
clone and install optimum-habana
move to
examples/text-generation
and runpython3 run_generation.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --use_hpu_graphs --limit_hpu_graph --use_kv_cache --reuse_cache --trim_logits --attn_softmax_bf16 --max_input_tokens 512 --max_new_tokens 2048 --bf16 --batch_size 1 --warmup 0 --n_iterations 3
The output looks like below. The flag '!' is unexpected padding in output:
Expected behavior
The expected output should be: