Fix too sensitive "Unsloth currently does not support multi GPU setups" when training with a single GPU in a multi-GPU environment.

unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory

https://unsloth.ai

Apache License 2.0

18.37k stars 1.28k forks source link

Fix too sensitive "Unsloth currently does not support multi GPU setups" when training with a single GPU in a multi-GPU environment. #1295

Open giuliabaldini opened 1 week ago

giuliabaldini commented 1 week ago

Hi there,

this PR has the changes requested in #974. I unfortunately don't have a system where I can test this myself, but I have been testing it with other people on a cluster that has multiple GPUs.

The only problem is that I think that the fix at llama.py:1694 does not seem to work, as we are still getting the error. So to make it run we have actually removed this check. Any ideas of how to fix that? Is it problematic to remove that check there?

@hife-ai @Datta0 @Sehyo

danielhanchen commented 4 days ago

Will re-investigate this - apologies on the delay!

Datta0 commented 4 days ago

Btw just thinking out loud (or thinking as written text) Should we consolidate all these multi GPU errors into a single function? rn I see there's check_nvidia and the other part of code in from_pretrained.

giuliabaldini commented 4 days ago

@Datta0, yeah, I definitely agree. However I am not incredibly familiar with patching functions this way, wouldn't the function have to be part of all the patched code, meaning that we have to rewrite it every time?

Peter-Fy commented 3 days ago

I tried deleting the check code in tokenizer_utils.py and llama.py, but I’m still getting the following error:

Traceback (most recent call last):

  File "/home/fdf/dpo_finetune.py", line 116, in <module>

    main()

  File "/home/fdf/dpo_finetune.py", line 108, in main

    trainer.train()

  File "<string>", line 40, in train

RuntimeError: Unsloth currently does not support multi GPU setups - but we are working on it!

However, I don’t know which line in unsloth triggered this error, so I can’t proceed to delete the check code further.

Sehyo commented 3 days ago

Hi guys, I have been a bit busy. I can submit version with all the fixes on either thursday or friday, have a hectic schedule until then.

giuliabaldini commented 2 days ago

Hi @Peter-Fy, did you try to install unsloth from this PR branch? Do you still get the error?

Peter-Fy commented 1 day ago

Hi @Peter-Fy, did you try to install unsloth from this PR branch? Do you still get the error?

Yes, I install unsloth from this PR branch, but I still get the error like:

Traceback (most recent call last):
  File "/home/fdf/qlora_finetune.py", line 133, in <module>
    main()
  File "/home/fdf/qlora_finetune.py", line 125, in main
    trainer.train()
  File "<string>", line 39, in train
RuntimeError: tokenizer_utils.py:971 Unsloth currently does not support multi GPU setups - but we are working on it!

So I delete the check code in tokenizer_utils.py:971, but I get another error like:

Traceback (most recent call last):
  File "/home/fdf/qlora_finetune.py", line 133, in <module>
    main()
  File "/home/fdf/qlora_finetune.py", line 125, in main
    trainer.train()
  File "<string>", line 40, in train
RuntimeError: Unsloth currently does not support multi GPU setups - but we are working on it!

Peter-Fy commented 1 day ago

Hi guys, I have been a bit busy. I can submit version with all the fixes on either thursday or friday, have a hectic schedule until then.

That will be helpful, looking forward to your fixes.