philschmid / deep-learning-pytorch-huggingface

MIT License
580 stars 138 forks source link

ValueError #14

Open Martok10 opened 1 year ago

Martok10 commented 1 year ago

Hi Phil,

thank you for sharing this useful blog post on how to finetune flan-t5-xxl efficiently. I am trying to run your code on google colab with gpu enabled but run into a ValueError when trying to load the sharded model "philschmid/flan-t5-xxl-sharded-fp16". This is the error message I am getting

image

when trying to execute this code cell

image

Any help would be highly appreciated

Thanks,

Max

philschmid commented 1 year ago

Which GPU are you using? You need atleast 24GB. If you have that it might be possible that the "cell" where you load the model was run multiple times.

Martok10 commented 1 year ago

I am running it on a free Colab instance so I am limited to 15GB of GPU Ram and 12GB of System Ram. I do not think that the problem is due to limited Ram because when the notebook crashes I am far from scratching the aforementioned limits. Maybe it is due to a different environment setup when using Colab free as compared to the Pro version. Anyways, would it be a lot of effort to adopt the code to use it with the FLAN Base version? Maybe this could be run with a free Colab instance...

philschmid commented 1 year ago

15GB of GPU ram is not enough to load the model in int8. Thats why you see the error. Yes you can adjust the example adjusting the model_id. you can try the xl version, which is 3B.

Martok10 commented 1 year ago

Thank you for your help :) The xl version ran out of memory too, but the large version ran just fine. It yields a Rogue1 score of 48.56% on the test set. System/GPU RAM never exceeded 8.5/4 GB.