Open YJHMITWEB opened 1 year ago
Hi @YJHMITWEB This is technically feasible, but would cause a sharp decline in the inference speed. Therefore, the practical significance is limited, and we currently do not consider it a high priority. Welcome to submit the corresponding proposal or PR to participate in the construction. Thanks.
Hi, currently in the examples, only
linear
describes a naive example of offload, in other projects such asopt
,bloom
,gpt
, there is no option for offload. I am wondering how to apply offload to large model inference, and any examples?