Closed UmutAlihan closed 1 year ago
I see that "example.py" iteratively generates prompt answers on both machines (or instances partially loaded layers onto GPUs). Is there any possibility that I utilize multiple GPUs on multiple machines in order to deploy a rest api service, so that I can send prompts as request from other services??
This would be quite easy to accomplish by wrapping the generation invocation in a Wrapyfi decorated method. There are several examples on https://github.com/fabawi/wrapyfi that can help you in achieving this. Unfortunately, I don't have access to several GPUs at the moment. A PR would be welcome
I see that "example.py" iteratively generates prompt answers on both machines (or instances partially loaded layers onto GPUs). Is there any possibility that I utilize multiple GPUs on multiple machines in order to deploy a rest api service, so that I can send prompts as request from other services??