google / gemma_pytorch

The official PyTorch implementation of Google's Gemma models
https://ai.google.dev/gemma
Apache License 2.0
5.3k stars 508 forks source link

How to solve the 'RESOURCE_EXHAUSTED' error when loading 'gemma2_instruct_2b_en' (the script is from kaggle and runs on colab with TPU)? #72

Open nicewang opened 1 month ago

nicewang commented 1 month ago

How to solve the 'RESOURCE_EXHAUSTED' error when loading 'gemma2_instruct_2b_en' (the script is from kaggle and runs on colab with TPU)? Errors shown following: Image

The environment is the colab opened from kaggle notebook, and with TPU v2-8 accelerator, and: RAM:6.02 GB/334.56 GB Disk:22.13 GB/225.33 GB

Had changed the XLA_PYTHON_CLIENT_MEM_FRACTION from 0.1 to 1.00, but seems useless: Image

Gopi-Uppari commented 1 month ago

Hi @nicewang,

I encountered the same issue when running on Google Colab with the runtime set to TPU v2-8, but it worked fine on Kaggle with the TPU VM v3-8 runtime. Could you please refer to this Gist file

@pengchongjin Could you please take a look at the issue.

Thank you.

nicewang commented 1 month ago

Thx @Gopi-Uppari ,

I had followed your shared colab file, and finally succeeded both on Kaggle with TPU VM v3-8 and colab with TPU v2-8.

Further, I am just curious about whether there is any dependencies conflicts issues? since I just changed my dependencies installation from: Image to: Image

Gopi-Uppari commented 2 weeks ago

Hi @nicewang,

I'm glad it worked for you, that could possibly be the reason for the dependency conflicts.

Thank you.