mit-han-lab / offsite-tuning

Offsite-Tuning: Transfer Learning without Full Model
https://arxiv.org/abs/2302.04870
MIT License
367 stars 39 forks source link

LM loss is nan #11

Closed QDPeng closed 6 months ago

QDPeng commented 6 months ago

I use the offsite-tuning code to run scripts/figure4/layerdrop.sh, here is a mistake: RuntimeError: expected scalar type Half but found Float

then, I modify the code: offsite_tuning/utils.py, line 659: change the code:model.adapter = layers[:l] + layers[r:] to: model.adapter = layers[:l].half() + layers[r:].half()

it run ok,but,Another error has occurred: Epoch 0 - Step 19 - LR: 1.90e-09 - LM loss: nan - KD loss: 0.0000: 1% the LM loss is nan.

the system is ubuntu22.04, gpu is nvidia T4, 16G. model is: facebook/opt-1.3b datasets: wikitext-2-raw-v1 train_module: adapter

Can you help me find the cause of the problem? thanks.

QDPeng commented 6 months ago

should use Nvida RTX3090 or higher gpu