Training Failure on Pascal GPUs (P40, P100)

emuchogu commented 4 months ago

I'm experiencing issues when attempting to run trainer_stats = trainer.train() on Pascal architecture GPUs (confirmed on P40, previously reported on P100). The training process fails, while it runs without issues on T4 GPUs on Kaggle.

Related Issue

This issue appears to be related to [previous issue link], where a similar problem was reported with a P100 GPU. https://github.com/unslothai/unsloth/issues/516#issuecomment-2127133305

My Environment

GPU: Tesla P40
Platform: Linux
PyTorch: 2.3.0+cu121
CUDA: 6.1
CUDA Toolkit: 12.1
Unsloth version: 2024.7
Notebook: https://colab.research.google.com/drive/17d3U-CAIwzmbDRqbZ9NnpHxCkmXB6LZ0?usp=sharing (running on local JupyterLab)

Setup Details

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
==((====))==  Unsloth: Fast Mistral patching release 2024.7
   \\   /|    GPU: Tesla P40. Max memory: 23.879 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.0+cu121. CUDA = 6.1. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.26.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth

Steps to Reproduce

Set up the environment as described above
Run the notebook: https://colab.research.google.com/drive/17d3U-CAIwzmbDRqbZ9NnpHxCkmXB6LZ0?usp=sharing
Attempt to execute trainer_stats = trainer.train()

Current Behavior

The training process fails when running trainer_stats = trainer.train() on the P40 GPU.

Expected Behavior

The training process should complete successfully, as it does on T4 GPUs.

Additional Notes

This issue seems to affect Pascal architecture GPUs (confirmed on P40 and P100).
The same notebook runs without issues on T4 GPUs on Colab

Questions

Are there any known compatibility issues between the current setup and Pascal architecture GPUs?
Are there any workarounds or configuration changes that could resolve this issue for Pascal GPUs?

emuchogu commented 4 months ago

I can confirm that this issue also occurs on Kaggle when using a P100 GPU. I tested using the sample notebook provided at https://www.kaggle.com/code/danielhanchen/kaggle-mistral-nemo-12b-unsloth-notebook

Interestingly, the same code works fine when using a T4 GPU on Kaggle. This suggests that the issue might be specific to certain GPU architectures.

danielhanchen commented 4 months ago

Yep sorry P100s and P40s are not recommended - I think the first ever Unsloth version might have worked, but I would recommend T4s (they also have tensor cores so matrix mults are 4x faster than P100s)

emuchogu commented 4 months ago

Hi, I managed to get it to train on my P40 in docker, I've posted my solution here: https://github.com/unslothai/unsloth/issues/512#issuecomment-2254623639

unslothai / unsloth