In the config_tiny_llama.py example, the model's vocab_size is 256, while the GPT-2 tokenizer's vocabulary size is 50257. This incompatibility is avoided by masking when tp>1, but will cause an error when tp=1. Therefore, this PR ensure that masking is also enabled when tp=1.
In the config_tiny_llama.py example, the model's vocab_size is 256, while the GPT-2 tokenizer's vocabulary size is 50257. This incompatibility is avoided by masking when tp>1, but will cause an error when tp=1. Therefore, this PR ensure that masking is also enabled when tp=1.