Closed yigediaosi closed 9 months ago
Yes, it's absolutely possible. You just need to tell it on the controller address and port. I do it on my clusters.
In fact, I even use slurm to manage them, so I don't care on which compute node each model runs.
You can check the slurm files on my fork of FastChat, https://github.com/HelmholtzAI-FZJ/FastChat/tree/main
Thank you for your reply.
Because I want to deploy approximately more than 10 models for user use. If all are deployed on a single machine, I am concerned about low prediction efficiency, and besides, our individual machine has limited space.