Open minipasila opened 3 months ago
Wait you're using the base (not instruct) correct?
Wait you're using the base (not instruct) correct?
yeah the base model.
Oh no I don't think using the base model is a good idea on using the Llama 3.1 chat template - those tokens are actually untrained, so you will get incorrect finetuning results - weird did Unsloth not error out?
I don't think I saw any visible errors at least. Just that when actually using the model it would use random reserved special tokens instead of the Llama 3 Instruct tokens after like it finishes generating the response. Like instead of outputting like <|eot_id|><|start_header_id|>user<|end_header_id|> at the end it outputs those unused tokens for some reason. So it looks more like <|reserved_special_token_34|><|reserved_special_token_57|>user<|reserved_special_token_221|>.
When training either Llama 3 or 3.1 8B base model using the Llama 3 template for conversation prompt format, it seems to not train with the correct tokens. It ends up producing text containing <|reserved_special_token_0|> tokens instead of <|start_header_id|>, <|end_header_id|> and <|eot_id|> tokens. Which breaks formatting. I don't remember having this issue before so I assume some recent change may have broken it. One thing to note is that when previewing the dataset (using
print(dataset[5]["text"])
) it shows up properly with the correct Llama 3 formatting.