Is there any examples of using offload feature in GPT/BLOOM/OPT inference?

hpcaitech / EnergonAI

Large-scale model inference.

Apache License 2.0

630 stars 90 forks source link

Is there any examples of using offload feature in GPT/BLOOM/OPT inference? #209

Open YJHMITWEB opened 1 year ago

YJHMITWEB commented 1 year ago

Hi, currently in the examples, only linear describes a naive example of offload, in other projects such as opt, bloom, gpt, there is no option for offload. I am wondering how to apply offload to large model inference, and any examples?

binmakeswell commented 1 year ago

Hi @YJHMITWEB This is technically feasible, but would cause a sharp decline in the inference speed. Therefore, the practical significance is limited, and we currently do not consider it a high priority. Welcome to submit the corresponding proposal or PR to participate in the construction. Thanks.