Vanilla pytorch LLaMA implementation

juncongmoo / pyllama

LLaMA: Open and Efficient Foundation Language Models

GNU General Public License v3.0

2.8k stars 312 forks source link

Vanilla pytorch LLaMA implementation #15

Closed galatolofederico closed 1 year ago

galatolofederico commented 1 year ago

Hey great work with pyllama. I may be wrong but i noticed that your code checks if the system has the same number of GPUs of the checkpoints (like here). If it is the case it means that you can only run the 65B version if you have 8 GPUs but this is not necessary.

Here you can find a vanilla pytorch implementation of LLaMA and a weights conversion script that you can use to run LLaMA using as many (or as few) GPUs as you want https://github.com/galatolofederico/vanilla-llama

Meathelix1 commented 1 year ago

Thank you for this, I have dual 3090's that I would like to try this version on.

mldevorg commented 1 year ago

@juncongmoo Can you please integrate vanilla into pyllama for people with more than 1 GPU?

galatolofederico commented 1 year ago

Integrating it into pyllama requires a huge rewrite, since vanilla already has an inference server maybe the best idea could be to implement the missing features (like the playground and the profiler) in vanilla and keep the projects separated. But i dont know, lets see what pyllama developer thinks