unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.13k stars 1.26k forks source link

CUDA_VISIBILE_DEVICES not functioning #660

Open xtchen96 opened 5 months ago

xtchen96 commented 5 months ago

I saw error message when I am trying to do supervised fine tuning with 4xA100 GPUs. So the free version cannot be used on multiple GPUs?

RuntimeError: Error: More than 1 GPUs have a lot of VRAM usage. Please obtain a commercial license.

danielhanchen commented 5 months ago

Oh currently Unsloth does not support multi GPU sorry - our enterprise plans have them for now - we're currently concentrated on adding Ollama, Llama-3 bug fixes, all model support and more in the OSS

aflah02 commented 4 months ago

@danielhanchen Is there a way to run Unsloth on only 1 GPU when I have a 2 GPU Node? I get the same error and I want to use only 1 GPU as the model easily fits on it? I tried

os.environ["CUDA_VISIBLE_DEVICES"] = "0"

But it did not work

erwe324 commented 4 months ago

Export it via shell before running the python script

danielhanchen commented 4 months ago

Yep you have to set the env variable before running Unsloth

miary commented 4 months ago

Setting the env variable before running Unsloth still does not resolve the problem.

Used: export CUDA_VISIBLE_DEVICES=0 but it still comes up with the error: RuntimeError: Error: More than 1 GPUs have a lot of VRAM usage. Please obtain a commercial license.

Also used: export CUDA_VISIBLE_DEVICES=1 but same problem.

erwe324 commented 4 months ago

@danielhanchen I am confused Kindly help. The error is asking to get a commercial licence.

@miary what GPUs are you using and are they already running another job?

miary commented 4 months ago

@danielhanchen I have 2 GPUs, both RTX 3090. This runtime error about more than 1 GPU is a brand new issue that came from Unsloth 2024.6.

I have a project that is using Unsloth 2024.5 and it works just fine.

It is completely fine is Unsloth wants to charge for environment with more than one GPU. However, the option should be given to use only one GPU, which is what setting the CUDA_VISIBLE_DEVICES env is supposed to do, but it's apparently broken. Looks like a really bad bug because it breaks the entire project.

danielhanchen commented 4 months ago

Hmm I shall investigate this hmmm.

How do you all call Unsloth? Via the terminal as a python script? Via Jupyter?

Chirobocea commented 4 months ago

I am using python script and had the same issue while trying to run on GPU 1 (if i set the code to have visibility only on GPU 0 it works fine).

I am using this as the first lines in my main code:

import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
os.environ["GRADIO_SHARE"]="1"
os.environ["WORLD_SIZE"] = "1"
danielhanchen commented 4 months ago

@Chirobocea So do you use python train.py or like torchrun?

Chirobocea commented 4 months ago

Usually I use python train.py. However, I just tried to lunch it with torchrun and it has same issue. Also I checked with the debugger that torch indeed sees only one gpu, which is renamed for the running code to id 0, while in the loading process of the model it takes VRAM only from GPU 1 as expected (from nvidia-smi).

danielhanchen commented 4 months ago

Ok thanks for the info! Running in runpod to see what I can do! :)

danielhanchen commented 4 months ago

@miary @Chirobocea @aflah02 Just fixed it! Hopefully it now can work! Apologies on the issues! Please update Unsloth via

pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git
molander commented 4 months ago

RuntimeError: Error: More than 1 GPUs have a lot of VRAM usage. Please obtain a commercial license.

Thanks for all your work, btw! Killer project!

==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1 \ /| Num examples = 1,029 | Num Epochs = 1 O^O/ _/ \ Batch size per device = 2 | Gradient Accumulation steps = 4 \ / Total batch size = 8 | Total steps = 30 "-____-" Number of trainable parameters = 41,943,040 Traceback (most recent call last): File "/home/matto/projects/baby-code/workspace/unsloth-orpo.py", line 128, in orpo_trainer.train() File "/home/matto/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train return inner_training_loop( File "", line 226, in _fast_inner_training_loop RuntimeError: Error: More than 1 GPUs have a lot of VRAM usage. Please obtain a commercial license.

molander commented 4 months ago

Can confirm it does not occur with unsloth-2024.5 but does at unsloth-2024.6 If necessary, one can downgrade via: pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git@f9689b1

danielhanchen commented 4 months ago

@molander Do you know if my latest fix fixes stuff?

molander commented 4 months ago

@danielhanchen no, as soon as I uninstall and pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git, it comes back :(

miary commented 4 months ago

@miary @Chirobocea @aflah02 Just fixed it! Hopefully it now can work! Apologies on the issues! Please update Unsloth via

pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

This patch did not solve the problem. Same error: RuntimeError: Error: More than 1 GPUs have a lot of VRAM usage. Please obtain a commercial license.

danielhanchen commented 4 months ago

Hmmm weird I tried it in runpod with 4x GPUs and it worked - I shall re try fixing this! Sorry everyone on the issue!

danielhanchen commented 4 months ago

@miary @molander I updated the package again! Apologies on the issues!

I found the below to work (change 1 to any device id)

export CUDA_VISIBLE_DEVICES=1 && python train_file.py

Likewise torchrun also works with that approach.

Hope this works! Thank you for your patience!

Chirobocea commented 4 months ago

@miary @Chirobocea @aflah02 Just fixed it! Hopefully it now can work! Apologies on the issues! Please update Unsloth via

pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

I tried this one hour ago and checked. It seams that the main problem is when I have something running on other GPU as well. For example if i have another code with another env on GPU 0, I can't run unsloth on GPU 1. The error is the same as before.

molander commented 4 months ago

Confirmed not working as intended. Nothing going on GPU = 1, will not run, even though Num GPUs shows as 1 in Unsloth banner below.

But on GPU = 0, after I closed everything but Xorg, it worked.

So, it would appear at first glance, it must use GPU0 in which case, you have a legitimate workaround, ticket closed, back to the real work ;)

Thank you for open-sourcing. I know that it takes big balls, and I assure you, it's worth it all the way around ;)

max_steps is given, it will override any value given in num_trainepochs ==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1 \ /| Num examples = 1,029 | Num Epochs = 1 O^O/ \/ \ Batch size per device = 2 | Gradient Accumulation steps = 4 \ / Total batch size = 8 | Total steps = 30 "-____-" Number of trainable parameters = 41,943,040

miary commented 4 months ago

@miary @molander I updated the package again! Apologies on the issues!

I found the below to work (change 1 to any device id)

export CUDA_VISIBLE_DEVICES=1 && python train_file.py

Likewise torchrun also works with that approach.

Hope this works! Thank you for your patience!

@danielhanchen Just wanted to confirm that your patch by including export CUDA_VISIBLE_DEVICES=1 works!!! Thanks for all the good work, greatly appreciated!

danielhanchen commented 4 months ago

@miary Great it worked!

@molander Thanks glad it's a workaround - I'll see what I can do. So export CUDA_VISIBLE_DEVICES=1 && python train_file.py still does not work? Do you use torchrun or python or accelerate?

molander commented 4 months ago

@danielhanchen Good to go here! I made a new conda env and conda installed pytorch, transformers, etc and it's working like a mule at the grand canyon! Thanks!

aflah02 commented 4 months ago

Thanks @danielhanchen!!

user799595 commented 4 months ago

This still isn't working for me. @danielhanchen can you please remove the exception if more than one GPU has 4GB of memory usage?

@molander Did you do anything custom on your end? Looking at the main branch, the code is still there.

vvatter commented 4 months ago

I'm on a node with multiple GPUs, but I only have one in CUDA_VISIBLE_DEVICES.

The issue I'm having is with these lines in the patch_sft_trainer_tokenizer() function of tokenizer_utils.py: https://github.com/unslothai/unsloth/blob/933d9fe2cb2459f949ee2250e90a5b610d277eab/unsloth/tokenizer_utils.py#L961-L970

The check for multiple GPUs here is really a count of how many GPUs on the node are using > 4gb of memory. This is going to fail for anyone on a busy shared node.

I removed that check, and a similar check in llama.py: https://github.com/unslothai/unsloth/blob/933d9fe2cb2459f949ee2250e90a5b610d277eab/unsloth/models/llama.py#L1198-L1207

Then I was able to run unsloth on my node.

danielhanchen commented 4 months ago

Much much apologies on the delay! My brother and I just relocated to SF, so just got back to Github issues!

As per the discussion here, I will instead convert it to a warning for people to say Unsloth is not yet functional for multi GPUs, and will still allow the finetuning process to go through (esp for shared servers)

danielhanchen commented 4 months ago

As requested, I made it into a warning instead and not an error :) Please update Unsloth and try it out! Hope it works now!

pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git
rangehow commented 3 months ago

Same error in

Name: unsloth
Version: 2024.8

script: export CUDA_VISIBLE_DEVICES=0 && python naive_train.py --model gemma_2b --dataset alpaca_gpt4 --lora

rangehow commented 3 months ago

As requested, I made it into a warning instead and not an error :) Please update Unsloth and try it out! Hope it works now!根据要求,我将其设置为警告,而不是错误:)请更新 Unsloth 并尝试一下!希望它现在有效!

pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

Also tried this :(

danielhanchen commented 3 months ago

Oh no it still doesn't work? I'll look into it sorry

Galaxy-Husky commented 3 months ago

Same issue.

linan-github commented 1 month ago

I have to apply @vvatter's fix on latest version to bypass the runtime error.

BenBatsir commented 3 weeks ago

any updates?

quanpr commented 1 week ago

I'm on a node with multiple GPUs, but I only have one in CUDA_VISIBLE_DEVICES.

The issue I'm having is with these lines in the patch_sft_trainer_tokenizer() function of tokenizer_utils.py:

https://github.com/unslothai/unsloth/blob/933d9fe2cb2459f949ee2250e90a5b610d277eab/unsloth/tokenizer_utils.py#L961-L970

The check for multiple GPUs here is really a count of how many GPUs on the node are using > 4gb of memory. This is going to fail for anyone on a busy shared node.

I removed that check, and a similar check in llama.py:

https://github.com/unslothai/unsloth/blob/933d9fe2cb2459f949ee2250e90a5b610d277eab/unsloth/models/llama.py#L1198-L1207

Then I was able to run unsloth on my node.

This solution works for me.