Open lapp0 opened 4 months ago
Interesting and thanks for the investigation!
Hi, I'm working on a project which requires fine-tuning llama-3-8b-Instruct-bnb-4bit on custom dataset. However when I try to increase per_device_batch_size
from 1 to any value, I'm facing same error RuntimeError: Function 'LoRA_MLPBackward' returned nan values in its 0th output.
. My notebook very similar to provided notebooks from unsloth. I read the issues, however my knowledge was not enough to understand the error. Are there anything I can do to fix this problem?
Oh apologies I did not solve this issue yet apologies - I'll try to take a look again but can't guarantee anything sorry :(
Thank you for your attention and time.
I met the same problem, whenever I set model.train() and use left padding, the loss cannot be differentiable and logits for those padding is all zero.
I also have no idea why I cannot get loss backward even if I do not set model.train(). I try this simple example on the provided llama3.1 colab.
and the error is this:
Could anyone tell me what I am doing wrong, Why my llama loss cannot get backwarded even in this simple case.
I'll investigate your new issue!
If pad tokens are used, and
model.eval(); model.train()
is called, Unsloth backward pass is undifferentiable, resulting innan
.Reproduction script (expand):
Overview:
get_loss_and_backwards(model, train_eval=False)
worksget_loss_and_backwards(model, train_eval=True)
fails withI use the following decorator to speed up generation when training with huggingface
AutoModel
s. It'd be great if this decorator worked with unsloth as well! (I noticed a 3x speedup in unsloth generation withmodel.eval()
set)