Running on CPUs? - Githubissues

The current version does not support CPU distribution. It is possible using Wrapyfi though, since it handles CPU and GPU tensors.

https://github.com/modular-ml/wrapyfi-examples_llama/blob/e066c4b9b08341ed768bf54ea352190cdd108f96/llama/model.py#L302

and

https://github.com/modular-ml/wrapyfi-examples_llama/blob/e066c4b9b08341ed768bf54ea352190cdd108f96/llama/model.py#L308

for both calls, you need to pass a device="cpu" argument, but you'd still have to modify the original implementation to run on CPU. Wrapyfi only adds a layer that allows you to distribute tensors across devices/machines, and I merely adapted llama to demonstrate that capability.

modular-ml / wrapyfi-examples_llama

Running on CPUs? #8