Closed musram closed 1 week ago
Hi @musram thank you for the question. For now we don't support running on heterogeneous GPUs. We will work on introducing REST API level protocols to bring different servers (that may run on different GPUs) together.
🚀 Feature
Support for heterogeneous devices
Motivation
Run inference using heterogeneous gpus(eg Shard model on metal and vulkan simultaneously) which will lead to use my house hold devices
Is it possible to share some direction to implement in MLC LLM?
@tqchen @junrushao @MasterJH5574