unslothai / unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
15.55k stars 1.04k forks source link

Training Failure on Pascal GPUs (P40, P100) #813

Open emuchogu opened 1 month ago

emuchogu commented 1 month ago

I'm experiencing issues when attempting to run trainer_stats = trainer.train() on Pascal architecture GPUs (confirmed on P40, previously reported on P100). The training process fails, while it runs without issues on T4 GPUs on Kaggle.

Related Issue

This issue appears to be related to [previous issue link], where a similar problem was reported with a P100 GPU. https://github.com/unslothai/unsloth/issues/516#issuecomment-2127133305

My Environment

Setup Details

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
==((====))==  Unsloth: Fast Mistral patching release 2024.7
   \\   /|    GPU: Tesla P40. Max memory: 23.879 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.0+cu121. CUDA = 6.1. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.26.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth

Steps to Reproduce

  1. Set up the environment as described above
  2. Run the notebook: https://colab.research.google.com/drive/17d3U-CAIwzmbDRqbZ9NnpHxCkmXB6LZ0?usp=sharing
  3. Attempt to execute trainer_stats = trainer.train()

Current Behavior

The training process fails when running trainer_stats = trainer.train() on the P40 GPU.

Expected Behavior

The training process should complete successfully, as it does on T4 GPUs.

Additional Notes

Questions

  1. Are there any known compatibility issues between the current setup and Pascal architecture GPUs?
  2. Are there any workarounds or configuration changes that could resolve this issue for Pascal GPUs?
emuchogu commented 1 month ago

I can confirm that this issue also occurs on Kaggle when using a P100 GPU. I tested using the sample notebook provided at https://www.kaggle.com/code/danielhanchen/kaggle-mistral-nemo-12b-unsloth-notebook

Interestingly, the same code works fine when using a T4 GPU on Kaggle. This suggests that the issue might be specific to certain GPU architectures.

danielhanchen commented 1 month ago

Yep sorry P100s and P40s are not recommended - I think the first ever Unsloth version might have worked, but I would recommend T4s (they also have tensor cores so matrix mults are 4x faster than P100s)

emuchogu commented 1 month ago

Hi, I managed to get it to train on my P40 in docker, I've posted my solution here: https://github.com/unslothai/unsloth/issues/512#issuecomment-2254623639