huggingface / blog

Public repo for HF blog posts
https://hf.co/blog
2.36k stars 745 forks source link

assisted model offload #1467

Open manjeetbhati opened 1 year ago

manjeetbhati commented 1 year ago

is there a library I could use to distribute model loading b/w gpu and cpu, I have gpu with 16gb memory and tried https://huggingface.co/blog/assisted-generation (the model upto 1.3b params works fine) but model 6.7b params and beyond fail to load due to large memory needed, is there a library that I could use to share the load b/w cpu and gpu?

SunMarc commented 1 year ago

Check this doc from accelerate library. You can use big model inference directly by passing device_map in from_pretained if you are using transformers library !