Closed pablogranolabar closed 3 years ago
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Should be addressed.
Taking a look at the pytorch_model.bin
saved on the microsoft/DialoGPT-small
repository, one can see it's made up of float16 weights. When loading the model in the GPT2Model
and saving it, the weights are saved in float32, resulting in the large increase.
If you want to keep the model in half precision, add the following line after initializing your model:
model.half()
Having a weird issue with DialoGPT Large model deployment. From PyTorch 1.8.0 and Transformers 4.3.3 using model.save_pretrained and tokenizer.save_pretrained, the exported pytorch_model.bin is almost twice the size of the model card repo and results in OOM on a reasonably equipped machine that when using the standard transformers download process it works fine (I am building a CI pipeline to containerize the model hence the pre-populated model requirement):
When I download the model card files directly however, I’m getting the following errors:
So what would be causing the large file variance between save_pretrained models and the model card repo? And any ideas why the directly downloaded model card files aren’t working in this example?
Thanks in advance