Open Joanna-0421 opened 1 year ago
Hi @Joanna-0421 If you don't need HTTP service, it seems unnecessary for you to use EnergonAI, and you can just use OPT at Colossal-AI. Thanks.
Hi, @binmakeswell Using enerogonAI insterad of colossal-AI should speed up inference on local machine with such as non-blocking pipeline parallel, redundant padding elimination, gpu offload, right?
If I does want to infer opt on local machine instead of http service, how should we modify the opt_server.py? Can you give us some examples?
hello, I want to just inference of pre-trained model in the terminal, but I don't want to run a HTTP server. How could I do that?