LM loss is nan - Githubissues

I use the offsite-tuning code to run scripts/figure4/layerdrop.sh, here is a mistake： RuntimeError: expected scalar type Half but found Float

then， I modify the code: offsite_tuning/utils.py, line 659: change the code：model.adapter = layers[:l] + layers[r:] to: model.adapter = layers[:l].half() + layers[r:].half()

it run ok，but，Another error has occurred： Epoch 0 - Step 19 - LR: 1.90e-09 - LM loss: nan - KD loss: 0.0000: 1% the LM loss is nan.

the system is ubuntu22.04, gpu is nvidia T4, 16G. model is: facebook/opt-1.3b datasets: wikitext-2-raw-v1 train_module: adapter

Can you help me find the cause of the problem？ thanks.

mit-han-lab / offsite-tuning

LM loss is nan #11