pytorch / torchtune

PyTorch native finetuning library
https://pytorch.org/torchtune/main/
BSD 3-Clause "New" or "Revised" License
4.35k stars 440 forks source link

Add comments explaining why 128011 is skipped for Llama3 Tokenizer #2014

Open RdoubleA opened 1 week ago

RdoubleA commented 1 week ago

See #1995 for more context. Just need to add a comment block here with the conclusions: https://github.com/pytorch/torchtune/blob/main/torchtune/models/llama3/_tokenizer.py#L29