tanreinama / GPTSAN

General-purpose Swich transformer based Japanese language model
MIT License
117 stars 4 forks source link

Cannot use the pretraiend model on Hugging Face #7

Closed omihub777 closed 1 year ago

omihub777 commented 1 year ago

Thank you for your great work! I'm trying to use your model Tanrei/GPTSAN-japanese from Hugging Face (link) on google colaboratory, but I bump into an error below. I'd appreciate it if you can elaborate on how to solve this issue. Thank you in advance!

tanreinama commented 1 year ago

Pull Request to HuggingFace is not yet merged. So to use it, you need to download and use the source code from this repository. In addition, the instance of the free version of Google Colab cannot run due to insufficient main memory. Prepare a high memory environment.

tanreinama commented 1 year ago

If there are no further questions, I will close this issue

omihub777 commented 1 year ago

oh, my bad. it's not merged yet. thanks!

younesbelkada commented 1 year ago

As a side note you can probably decrease the memory requirement of the model by doing:

model = AutoModel.from_pretrained(ckpt, torch_dtype=torch.float16)

for loading in half-precision, we can also work on making the model 8-bit compatible to load it in 8bit