modular-ml / wrapyfi-examples_llama

Inference code for facebook LLaMA models with Wrapyfi support
GNU General Public License v3.0
130 stars 14 forks source link

Running on CPUs? #8

Closed fedelrick closed 1 year ago

fedelrick commented 1 year ago

This isnt really an issue, but im trying to find a method to link multiple mobile/laptop devices together to piggyback of each CPU essentially. Is it doable with this fork? Any suggestions and tips would be welcome!

fabawi commented 1 year ago

The current version does not support CPU distribution. It is possible using Wrapyfi though, since it handles CPU and GPU tensors.

https://github.com/modular-ml/wrapyfi-examples_llama/blob/e066c4b9b08341ed768bf54ea352190cdd108f96/llama/model.py#L302

and

https://github.com/modular-ml/wrapyfi-examples_llama/blob/e066c4b9b08341ed768bf54ea352190cdd108f96/llama/model.py#L308

for both calls, you need to pass a device="cpu" argument, but you'd still have to modify the original implementation to run on CPU. Wrapyfi only adds a layer that allows you to distribute tensors across devices/machines, and I merely adapted llama to demonstrate that capability.