hpcaitech / EnergonAI

Large-scale model inference.
Apache License 2.0
631 stars 90 forks source link

OPT inference #198

Open Joanna-0421 opened 1 year ago

Joanna-0421 commented 1 year ago

hello, I want to just inference of pre-trained model in the terminal, but I don't want to run a HTTP server. How could I do that?

binmakeswell commented 1 year ago

Hi @Joanna-0421 If you don't need HTTP service, it seems unnecessary for you to use EnergonAI, and you can just use OPT at Colossal-AI. Thanks.

irasin commented 1 year ago

Hi, @binmakeswell Using enerogonAI insterad of colossal-AI should speed up inference on local machine with such as non-blocking pipeline parallel, redundant padding elimination, gpu offload, right?

If I does want to infer opt on local machine instead of http service, how should we modify the opt_server.py? Can you give us some examples?