lamalab-org / MatText

Text-based modeling of materials.
https://lamalab-org.github.io/MatText/
MIT License
23 stars 2 forks source link

are our tokenizers initalized correctly? #99

Open kjappelbaum opened 4 weeks ago

kjappelbaum commented 4 weeks ago

perhaps not for batch inference

n0w0f commented 3 weeks ago

For the llama runs we do not use mattext tokenizers though.

n0w0f commented 3 weeks ago

Ah I see now. There was this issue of Llama tokenizer not including pad token. So we set tokenizer.pad_token = tokenizer.eos_token ref.

We also tried adding a token, this then resized the vocab and creates a set of problems