hpcaitech / ColossalAI-Examples

Examples of training models with hybrid parallelism using ColossalAI
Apache License 2.0
333 stars 102 forks source link

Load ColossalAI GPT model as HuggingFace/Transformers Model #199

Open Red-Giuliano opened 1 year ago

Red-Giuliano commented 1 year ago

Describe the feature

Hi all,

I'm trying to use a GPT model I trained using ColossalAI with huggingface/transformers for inference but it's not possible to load the model as a huggingface model as it is implemented in pytorch. How can I go about loading the model I trained using huggingface/transformers library?

Thanks so much for your help.

Best, Red

feifeibear commented 1 year ago

Hi, can you tell me your method to use a GPT model I trained using ColossalAI with huggingface/transformers? Pointing out which example is your implementation reference would be helpful.

Red-Giuliano commented 1 year ago

Hi feifeibear,

Thanks so much for your reply. The code I've used to train the model is adapted is from the /language/gpt/ example. I created a smaller version of the gpt2_vanilla configuration because my task did not require a model quite that large.

Now I have the model.pt file that I saved. When I try to load it using the transformers library I run into problems though (this makes sense since the GPT model is imported from the titans module, and not transformers). I'd love to use this model using the huggingface/transformers library so that I can take advantage of the functionality within that ecosystem.

From the research I've done it seems that the transformers library is expecting a model file with specific keys for each layer so I'm working on seeing if there is any way to resolve the discrepancy there. I know that the library is supported at some level because of this blog post:

https://medium.com/@yangyou_berkeley/colossal-ai-seamlessly-accelerates-large-models-at-low-costs-with-hugging-face-4d1a887e500d

But would love some more advice for my use case. Thanks so much once again for your time and help!