Open xtchen96 opened 5 months ago
Oh currently Unsloth does not support multi GPU sorry - our enterprise plans have them for now - we're currently concentrated on adding Ollama, Llama-3 bug fixes, all model support and more in the OSS
@danielhanchen Is there a way to run Unsloth on only 1 GPU when I have a 2 GPU Node? I get the same error and I want to use only 1 GPU as the model easily fits on it? I tried
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
But it did not work
Export it via shell before running the python script
Yep you have to set the env variable before running Unsloth
Setting the env variable before running Unsloth still does not resolve the problem.
Used: export CUDA_VISIBLE_DEVICES=0 but it still comes up with the error: RuntimeError: Error: More than 1 GPUs have a lot of VRAM usage. Please obtain a commercial license.
Also used: export CUDA_VISIBLE_DEVICES=1 but same problem.
@danielhanchen I am confused Kindly help. The error is asking to get a commercial licence.
@miary what GPUs are you using and are they already running another job?
@danielhanchen I have 2 GPUs, both RTX 3090. This runtime error about more than 1 GPU is a brand new issue that came from Unsloth 2024.6.
I have a project that is using Unsloth 2024.5 and it works just fine.
It is completely fine is Unsloth wants to charge for environment with more than one GPU. However, the option should be given to use only one GPU, which is what setting the CUDA_VISIBLE_DEVICES env is supposed to do, but it's apparently broken. Looks like a really bad bug because it breaks the entire project.
Hmm I shall investigate this hmmm.
How do you all call Unsloth? Via the terminal as a python script? Via Jupyter?
I am using python script and had the same issue while trying to run on GPU 1 (if i set the code to have visibility only on GPU 0 it works fine).
I am using this as the first lines in my main code:
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
os.environ["GRADIO_SHARE"]="1"
os.environ["WORLD_SIZE"] = "1"
@Chirobocea So do you use python train.py
or like torchrun
?
Usually I use python train.py. However, I just tried to lunch it with torchrun and it has same issue. Also I checked with the debugger that torch indeed sees only one gpu, which is renamed for the running code to id 0, while in the loading process of the model it takes VRAM only from GPU 1 as expected (from nvidia-smi).
Ok thanks for the info! Running in runpod to see what I can do! :)
@miary @Chirobocea @aflah02 Just fixed it! Hopefully it now can work! Apologies on the issues! Please update Unsloth via
pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git
RuntimeError: Error: More than 1 GPUs have a lot of VRAM usage. Please obtain a commercial license.
Thanks for all your work, btw! Killer project!
==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1
\ /| Num examples = 1,029 | Num Epochs = 1
O^O/ _/ \ Batch size per device = 2 | Gradient Accumulation steps = 4
\ / Total batch size = 8 | Total steps = 30
"-____-" Number of trainable parameters = 41,943,040
Traceback (most recent call last):
File "/home/matto/projects/baby-code/workspace/unsloth-orpo.py", line 128, in
Can confirm it does not occur with unsloth-2024.5 but does at unsloth-2024.6 If necessary, one can downgrade via: pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git@f9689b1
@molander Do you know if my latest fix fixes stuff?
@danielhanchen no, as soon as I uninstall and pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git, it comes back :(
@miary @Chirobocea @aflah02 Just fixed it! Hopefully it now can work! Apologies on the issues! Please update Unsloth via
pip uninstall unsloth -y pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git
This patch did not solve the problem. Same error: RuntimeError: Error: More than 1 GPUs have a lot of VRAM usage. Please obtain a commercial license.
Hmmm weird I tried it in runpod with 4x GPUs and it worked - I shall re try fixing this! Sorry everyone on the issue!
@miary @molander I updated the package again! Apologies on the issues!
I found the below to work (change 1 to any device id)
export CUDA_VISIBLE_DEVICES=1 && python train_file.py
Likewise torchrun
also works with that approach.
Hope this works! Thank you for your patience!
@miary @Chirobocea @aflah02 Just fixed it! Hopefully it now can work! Apologies on the issues! Please update Unsloth via
pip uninstall unsloth -y pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git
I tried this one hour ago and checked. It seams that the main problem is when I have something running on other GPU as well. For example if i have another code with another env on GPU 0, I can't run unsloth on GPU 1. The error is the same as before.
Confirmed not working as intended. Nothing going on GPU = 1, will not run, even though Num GPUs shows as 1 in Unsloth banner below.
But on GPU = 0, after I closed everything but Xorg, it worked.
So, it would appear at first glance, it must use GPU0 in which case, you have a legitimate workaround, ticket closed, back to the real work ;)
Thank you for open-sourcing. I know that it takes big balls, and I assure you, it's worth it all the way around ;)
max_steps is given, it will override any value given in num_trainepochs ==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1 \ /| Num examples = 1,029 | Num Epochs = 1 O^O/ \/ \ Batch size per device = 2 | Gradient Accumulation steps = 4 \ / Total batch size = 8 | Total steps = 30 "-____-" Number of trainable parameters = 41,943,040
@miary @molander I updated the package again! Apologies on the issues!
I found the below to work (change 1 to any device id)
export CUDA_VISIBLE_DEVICES=1 && python train_file.py
Likewise
torchrun
also works with that approach.Hope this works! Thank you for your patience!
@danielhanchen Just wanted to confirm that your patch by including export CUDA_VISIBLE_DEVICES=1 works!!! Thanks for all the good work, greatly appreciated!
@miary Great it worked!
@molander Thanks glad it's a workaround - I'll see what I can do. So export CUDA_VISIBLE_DEVICES=1 && python train_file.py
still does not work? Do you use torchrun
or python
or accelerate
?
@danielhanchen Good to go here! I made a new conda env and conda installed pytorch, transformers, etc and it's working like a mule at the grand canyon! Thanks!
Thanks @danielhanchen!!
This still isn't working for me. @danielhanchen can you please remove the exception if more than one GPU has 4GB of memory usage?
@molander Did you do anything custom on your end? Looking at the main branch, the code is still there.
I'm on a node with multiple GPUs, but I only have one in CUDA_VISIBLE_DEVICES.
The issue I'm having is with these lines in the patch_sft_trainer_tokenizer()
function of tokenizer_utils.py
:
https://github.com/unslothai/unsloth/blob/933d9fe2cb2459f949ee2250e90a5b610d277eab/unsloth/tokenizer_utils.py#L961-L970
The check for multiple GPUs here is really a count of how many GPUs on the node are using > 4gb of memory. This is going to fail for anyone on a busy shared node.
I removed that check, and a similar check in llama.py
:
https://github.com/unslothai/unsloth/blob/933d9fe2cb2459f949ee2250e90a5b610d277eab/unsloth/models/llama.py#L1198-L1207
Then I was able to run unsloth on my node.
Much much apologies on the delay! My brother and I just relocated to SF, so just got back to Github issues!
As per the discussion here, I will instead convert it to a warning for people to say Unsloth is not yet functional for multi GPUs, and will still allow the finetuning process to go through (esp for shared servers)
As requested, I made it into a warning instead and not an error :) Please update Unsloth and try it out! Hope it works now!
pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git
Same error in
Name: unsloth
Version: 2024.8
script: export CUDA_VISIBLE_DEVICES=0 && python naive_train.py --model gemma_2b --dataset alpaca_gpt4 --lora
As requested, I made it into a warning instead and not an error :) Please update Unsloth and try it out! Hope it works now!根据要求,我将其设置为警告,而不是错误:)请更新 Unsloth 并尝试一下!希望它现在有效!
pip uninstall unsloth -y pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git
Also tried this :(
Oh no it still doesn't work? I'll look into it sorry
Same issue.
I have to apply @vvatter's fix on latest version to bypass the runtime error.
any updates?
I'm on a node with multiple GPUs, but I only have one in CUDA_VISIBLE_DEVICES.
The issue I'm having is with these lines in the
patch_sft_trainer_tokenizer()
function oftokenizer_utils.py
:The check for multiple GPUs here is really a count of how many GPUs on the node are using > 4gb of memory. This is going to fail for anyone on a busy shared node.
I removed that check, and a similar check in
llama.py
:Then I was able to run unsloth on my node.
This solution works for me.
I saw error message when I am trying to do supervised fine tuning with 4xA100 GPUs. So the free version cannot be used on multiple GPUs?
RuntimeError: Error: More than 1 GPUs have a lot of VRAM usage. Please obtain a commercial license.