poloclub / transformer-explainer

Transformer Explained Visually: Learn How LLM Transformer Models Work with Interactive Visualization
https://poloclub.github.io/transformer-explainer/
MIT License
2.6k stars 223 forks source link

Same input token - different ID values? #15

Closed mazurkin closed 1 month ago

mazurkin commented 1 month ago

Please check the screenshot image

Why ID for two instances of "data" ate different?

Also every time I open that section the id values are changing

gracekimcy commented 1 month ago

Thank you for bringing this issue to our attention! This issue has now been fixed, and the IDs should remain consistent as expected. Regarding the difference between the two instances of "data," the first token is "Data" while the last token is " data" (with a lowercase 'd' and a space!). You can explore tokenization further with this tool.