Closed galatolofederico closed 1 year ago
Thank you for this, I have dual 3090's that I would like to try this version on.
@juncongmoo Can you please integrate vanilla into pyllama for people with more than 1 GPU?
Integrating it into pyllama requires a huge rewrite, since vanilla already has an inference server maybe the best idea could be to implement the missing features (like the playground and the profiler) in vanilla and keep the projects separated. But i dont know, lets see what pyllama developer thinks
Hey great work with
pyllama
. I may be wrong but i noticed that your code checks if the system has the same number of GPUs of the checkpoints (like here). If it is the case it means that you can only run the 65B version if you have 8 GPUs but this is not necessary.Here you can find a vanilla pytorch implementation of LLaMA and a weights conversion script that you can use to run LLaMA using as many (or as few) GPUs as you want https://github.com/galatolofederico/vanilla-llama