Open WilTay1 opened 8 months ago
Thanks for the great work. Why the size of 13B-chat model smaller than that of 7B-chat model? Which one has better performance? Thanks!
The 7B model is stored using dtype fp32. The 13B model is stored using dtype bf16, so the weight is smaller. Yes, the 13B model achieves better performance.
Thanks for the great work. Why the size of 13B-chat model smaller than that of 7B-chat model? Which one has better performance? Thanks!