hpcaitech / EnergonAI

Large-scale model inference.
Apache License 2.0
630 stars 90 forks source link

inference of pre-trained model #125

Open Emerald01 opened 2 years ago

Emerald01 commented 2 years ago

Hi, I am very interested in the distributed inference of Colossal AI. Since we have pre-trained NLP models from Pytorch or JAX, I wonder if possible or what should be done to use EnergonAI for inference. Since at the inference(model production) stage, the requirement for a smaller model instance is much more needed than in the training stage, just imagine you have a NLP model server to produce result to the client.

From your document, For models trained by [Colossal-AI](https://github.com/hpcaitech/ColossalAI), they can be seamlessly transferred to Energon-AI. For single-device models, they require manual coding works to introduce tensor parallelism and pipeline parallelism. I do not have a good clue on how this is related to my question. If you have some examples, I am eager to take a study.

For Microsoft DeepSpeed, they claim DeepSpeed provides a seamless inference mode for compatible transformer based models trained using DeepSpeed, Megatron, and HuggingFace, meaning that we don’t require any change on the modeling side such as exporting the model or creating a different checkpoint from your trained checkpoints. I am wondering if Colossal AI has similar capability.

dujiangsu commented 2 years ago

Yes, the model trained by ColossalAI can be easily transferred to EnergonAI, achieving the deployment. We are preparing the demo for serving a GPT-3 level model these days.