Open ngun7 opened 1 year ago
NVM, I restarted the notebook and it works now 😅 although, quick question: currently, it takes about 25h to train on my custom 21k QA pairs for 3 epochs on g5.24x, is there any way we can speed up this?
again, thanks for all your articles on flan-t5 @philschmid 🙌
I'm trying to finetune flan-t5-xxl on custom QA task, thanks for detailed article peft. However I'm encountering error:
This mat values change and different each time when I try running trainer.train() multiple times.
To eliminate doubt of my custom dataset issue, I ran your notebook without changing any code, it still failed with above matmul error. Sometimes i get
Apologies, I tried online to resolve this but no luck.
Versions: transformers==4.27.1" "datasets==2.9.0" "accelerate==0.17.1" "evaluate==0.4.0" "bitsandbytes==0.37.1 Sagemaker notebook instance: ml.g5.24xlarge