stanford-crfm / levanter

Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax
https://levanter.readthedocs.io/en/latest/
Apache License 2.0
519 stars 82 forks source link

fix for token bug that skips EOS #815

Closed ahmeda14960 closed 5 days ago

ahmeda14960 commented 5 days ago

This will actually add EOS if the tokenizer doesn't have it (HF is not good for this)