microsoft / Phi-3CookBook

This is a Phi-3 book for getting started with Phi-3. Phi-3, a family of open sourced AI models developed by Microsoft. Phi-3 models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across a variety of language, reasoning, coding, and math benchmarks.
MIT License
2.31k stars 232 forks source link

4K vs 128K version #90

Closed ealmazanm closed 2 months ago

ealmazanm commented 2 months ago

Hello there, a quick question here. I just read the technical report and I am not sure what is the beneifit of using the 4K version. Apparently the 128K has a similar performance across the board. My understanding is that the architecture is exactly the same, the only difference is the post-training stage and the data used for that stage.

Thanks!

leestott commented 2 months ago

Hi @ealmazanm I noticed you also added the same question to hugging face discussions https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/discussions/87#668cd9454912e2ac2155ef83

The Phi-3 model available in two context-length variants—4K and 128K tokens https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/

Here are some key differences and benefits of using the 4K version versus the 128K version:

Size and Efficiency The 4K version of the Phi-3 model is smaller in size, which means it requires less computing power. This makes it more cost-effective, easier to fine-tune, and allows for faster response times. It's especially great for resource-constrained environments including on-device and offline inference scenarios, and latency-bound scenarios where fast response times are critical.

Context Awareness The 4K version ensures efficient processing of input data while maintaining context awareness. On the other hand, the 128K is ideal for handling tasks requiring broader context comprehension.

The choice between the 4K and 128K versions depends on your specific needs. If you require a model that is more cost-effective and efficient, the 4K version would be a good choice. If your tasks require broader context comprehension, the 128K version would be more suitable. Both versions offer high performance for their respective sizes.