May I ask how to obtain the bin file for tokenizer? Thank you

iangitonga / nanochatllms.cpp

C++ implementation of LLMs with less than 3 billion params.

MIT License

1 stars 0 forks source link

Closed zss205 closed 5 months ago

iangitonga commented 5 months ago

Hello, apologies for the delayed response.

For the MiniCPM and TinyLLama models, I used the script HERE to create them.

And as for the Zephyr model, I used the bin data for the GPT2 tokenizer obtained from HERE. The bin data is embedded in the model files.