Open ml-maddi opened 5 months ago
@ml-maddi Can you screenshot Unsloth's info section
@ml-maddi Oh that's long! It seems like your local installation of CUDA might be broken maybe? Does non Unsloth code paths work as expected, or is this just an Unsloth issue? (Ie pure HF / TRL training scripts)
I have been using unsloth to train a chatbot model using the mistral 7b instruct model. I have successfully fine-tuned it and got inference results accordingly.
But the only issue I am having is that, the model never stops generating its token till the "max_new_tokens" and never ends with proper complete answer with "eos_token". The output seems very correct, but more outputs are there, seems it just got truncated.
To fix this issue, I have tried to use tokenizer.padding_side="left" and tokenizer.truncation_side="left" , as its mentioned here wrong_padding_side. But after using this modification for further fine-tuning , using the previously fine-tuned checkpoint(padding_side = "right") , I am getting the following cuda error after some training steps like 35 to 39
CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16BF, lda, b, CUDA_R_16BF, ldb, &fbeta, c, CUDA_R_16BF, ldc, CUDA_R_32F,
.How can i solve this cuda issue or if i have to use "padding_side" as "right", how can i ensure my model gives a completely finished output within the "max_new_tokens" limit with proper ending?
Note: the modified fine-tuning seems to be going okay on kaggle , but I am facing the error when I am trying to fine-tune locally with GPU RTX 3090