I use the offsite-tuning code to run scripts/figure4/layerdrop.sh, here is a mistake:
RuntimeError: expected scalar type Half but found Float
then, I modify the code: offsite_tuning/utils.py, line 659:
change the code:model.adapter = layers[:l] + layers[r:]
to: model.adapter = layers[:l].half() + layers[r:].half()
it run ok,but,Another error has occurred:
Epoch 0 - Step 19 - LR: 1.90e-09 - LM loss: nan - KD loss: 0.0000: 1%
the LM loss is nan.
the system is ubuntu22.04, gpu is nvidia T4, 16G.
model is: facebook/opt-1.3b
datasets: wikitext-2-raw-v1
train_module: adapter
Can you help me find the cause of the problem?
thanks.
I use the offsite-tuning code to run scripts/figure4/layerdrop.sh, here is a mistake: RuntimeError: expected scalar type Half but found Float
then, I modify the code: offsite_tuning/utils.py, line 659: change the code:model.adapter = layers[:l] + layers[r:] to: model.adapter = layers[:l].half() + layers[r:].half()
it run ok,but,Another error has occurred: Epoch 0 - Step 19 - LR: 1.90e-09 - LM loss: nan - KD loss: 0.0000: 1% the LM loss is nan.
the system is ubuntu22.04, gpu is nvidia T4, 16G. model is: facebook/opt-1.3b datasets: wikitext-2-raw-v1 train_module: adapter
Can you help me find the cause of the problem? thanks.