Support multi-GPU support for Hugging Face provider - Githubissues

zeno-ml / zeno-build

Build, evaluate, understand, and fix LLM-based apps

MIT License

482 stars 33 forks source link

Support multi-GPU support for Hugging Face provider #115

Closed neubig closed 8 months ago

neubig commented 1 year ago

For locally hosted models from Hugging Face, it would be good to support multi-GPU inference, including:

[ ] Model parallelism to make sure that larger models fit in memory
[ ] Data parallelism for improved inference speed

Currently inference is handled using the Hugging Face provider: https://github.com/zeno-ml/zeno-build/blob/23d30803bf27d5669ab666b5f05c95f5283b780b/zeno_build/models/providers/huggingface_utils.py#L13

Any code to support multi-GPU inference would have to be added there. Contributions are welcome!