Open emuchogu opened 4 months ago
I can confirm that this issue also occurs on Kaggle when using a P100 GPU. I tested using the sample notebook provided at https://www.kaggle.com/code/danielhanchen/kaggle-mistral-nemo-12b-unsloth-notebook
Interestingly, the same code works fine when using a T4 GPU on Kaggle. This suggests that the issue might be specific to certain GPU architectures.
Yep sorry P100s and P40s are not recommended - I think the first ever Unsloth version might have worked, but I would recommend T4s (they also have tensor cores so matrix mults are 4x faster than P100s)
Hi, I managed to get it to train on my P40 in docker, I've posted my solution here: https://github.com/unslothai/unsloth/issues/512#issuecomment-2254623639
I'm experiencing issues when attempting to run
trainer_stats = trainer.train()
on Pascal architecture GPUs (confirmed on P40, previously reported on P100). The training process fails, while it runs without issues on T4 GPUs on Kaggle.Related Issue
This issue appears to be related to [previous issue link], where a similar problem was reported with a P100 GPU. https://github.com/unslothai/unsloth/issues/516#issuecomment-2127133305
My Environment
Setup Details
Steps to Reproduce
trainer_stats = trainer.train()
Current Behavior
The training process fails when running
trainer_stats = trainer.train()
on the P40 GPU.Expected Behavior
The training process should complete successfully, as it does on T4 GPUs.
Additional Notes
Questions