unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.58k stars 1.3k forks source link

Can not use unsloth on vphere with ubuntu vm (vGPU) #1137

Open NeilL0412 opened 1 month ago

NeilL0412 commented 1 month ago

I have the same problem like this cannot use unsloth, but when I run the code below it is still got the same error : os.environ['CUDA_VISIBLE_DEVICES'] = "0"

Error: ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom device_maptofrom_pretrained. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.

Below is the information: ==((====))== Unsloth 2024.10.0: Fast Qwen2 patching. Transformers = 4.44.2. \\ /| GPU: GRID A100X-40C. Max memory: 39.996 GB. Platform = Linux. O^O/ \_/ \ Pytorch: 2.4.0. CUDA = 8.0. CUDA Toolkit = 12.1. \ / Bfloat16 = TRUE. FA [Xformers = 0.0.28.post1. FA2 = False] "-____-" Free Apache license: http://github.com/unslothai/unsloth

MuhammadBilal848 commented 1 month ago

having a similar problem. HAVE YOU FOUND ANY SOLUTIONS? In my case I'm loading a unsloth/Meta-Llama-3.1-70B-bnb-4bit on Kaggle.

image

danielhanchen commented 1 month ago

It's possible the model is too large to fit with Unsloth. In general a 16GB GPU can fit a 22B model. A 48GB GPU can do 70B.

NeilL0412 commented 1 month ago

It's possible the model is too large to fit with Unsloth. In general a 16GB GPU can fit a 22B model. A 48GB GPU can do 70B.

But I loaded unsloth/ qwen2-7b-instruction-bnb-4bit and still got the same error.

danielhanchen commented 1 month ago

@NeilL0412 Wait are you sure? Even a 7B model? That's very weird