meta-llama / llama

Inference code for Llama models
Other
55.41k stars 9.45k forks source link

xformers #232

Open FrancescoSaverioZuppichini opened 1 year ago

FrancescoSaverioZuppichini commented 1 year ago

Hi 👋

Thanks for the amazing works. In the paper the authors said xformers was used, I don't see it here

Thanks,

Fra

Fangzhou-Ai commented 1 year ago

Yes xformers can be adopted in the attention layer of llama model, I recently submit a PR #679 to enable xformers.

FrancescoSaverioZuppichini commented 1 year ago

Yes xformers can be adopted in the attention layer of llama model, I recently submit a PR #679 to enable xformers.

then the authors lied about it if you just submit a PR for it

Fangzhou-Ai commented 1 year ago

Well "lying" is a very serious charge for a published paper and I do believe the authors employed xformers during their in-house training procedure. From a closed issue also related to xformers in this repo, it seems that this llama model is more likely to serve as an educational purpose, thus attention part is explicitly written down to demonstrate the mathematical process. I recently noticed that lots of people who are not familiar with the inner details of llama are using this repo directly without any optimization, hence I submitted a "patch" to apply a faster attention as a "free lunch" for every one.

FrancescoSaverioZuppichini commented 1 year ago

Well "lying" is a very serious charge for a published paper and I do believe the authors employed xformers during their in-house training procedure. From a closed issue also related to xformers in this repo, it seems that this llama model is more likely to serve as an educational purpose, thus attention part is explicitly written down to demonstrate the mathematical process. I recently noticed that lots of people who are not familiar with the inner details of llama are using this repo directly without any optimization, hence I submitted a "patch" to apply a faster attention as a "free lunch" for every one.

so you are saying that the code that is here is not what it was used, so they lied on saying that they "open source" the code since it is not what it was used - given your thesis or why removing it if you have it already implemented?

Not sure, well it's from meta at the end of the day so I wouldn't say they are the most honest company even if all the authors are the best persons in the world they need to bend over

FrancescoSaverioZuppichini commented 1 year ago

Well "lying" is a very serious charge for a published paper and I do believe the authors employed xformers during their in-house training procedure. From a closed issue also related to xformers in this repo, it seems that this llama model is more likely to serve as an educational purpose, thus attention part is explicitly written down to demonstrate the mathematical process. I recently noticed that lots of people who are not familiar with the inner details of llama are using this repo directly without any optimization, hence I submitted a "patch" to apply a faster attention as a "free lunch" for every one.

Btw thanks a lot for the patch!