rotary position embedding cause different output in different tensor parallel settings!

meta-llama / llama

Inference code for Llama models

Other

56.1k stars 9.53k forks source link

rotary position embedding cause different output in different tensor parallel settings! #203

Open marscrazy opened 1 year ago

marscrazy commented 1 year ago

Thanks for your great work in LLM. I have tried to load llama-13b in different mp size settings, e.g., 2,4. However, the output embedding and generated sentence changes with the change of mp settings.

My question: Is this normal?

mp size = 4

mp size = 2

marscrazy commented 1 year ago

The -3.8359 is the mean of output embedding and 1.9458 is the std with mp size =4. the mean and std is changed when mp size=2