4K vs 128K version - Githubissues

Hi @ealmazanm I noticed you also added the same question to hugging face discussions https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/discussions/87#668cd9454912e2ac2155ef83

The Phi-3 model available in two context-length variants—4K and 128K tokens https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/

Here are some key differences and benefits of using the 4K version versus the 128K version:

Size and Efficiency The 4K version of the Phi-3 model is smaller in size, which means it requires less computing power. This makes it more cost-effective, easier to fine-tune, and allows for faster response times. It's especially great for resource-constrained environments including on-device and offline inference scenarios, and latency-bound scenarios where fast response times are critical.

Context Awareness The 4K version ensures efficient processing of input data while maintaining context awareness. On the other hand, the 128K is ideal for handling tasks requiring broader context comprehension.

The choice between the 4K and 128K versions depends on your specific needs. If you require a model that is more cost-effective and efficient, the 4K version would be a good choice. If your tasks require broader context comprehension, the 128K version would be more suitable. Both versions offer high performance for their respective sizes.

microsoft / Phi-3CookBook

4K vs 128K version #90