Open kjappelbaum opened 3 months ago
For the llama runs we do not use mattext tokenizers though.
Ah I see now.
There was this issue of Llama tokenizer not including pad token.
So we set tokenizer.pad_token = tokenizer.eos_token
ref.
We also tried adding a token, this then resized the vocab and creates a set of problems
This is not an issue for the serial interface that is in the code at the moment. For batched inference this might be important in the future
perhaps not for batch inference