Open NeilL0412 opened 1 month ago
having a similar problem. HAVE YOU FOUND ANY SOLUTIONS? In my case I'm loading a unsloth/Meta-Llama-3.1-70B-bnb-4bit on Kaggle.
It's possible the model is too large to fit with Unsloth. In general a 16GB GPU can fit a 22B model. A 48GB GPU can do 70B.
It's possible the model is too large to fit with Unsloth. In general a 16GB GPU can fit a 22B model. A 48GB GPU can do 70B.
But I loaded unsloth/ qwen2-7b-instruction-bnb-4bit and still got the same error.
@NeilL0412 Wait are you sure? Even a 7B model? That's very weird
I have the same problem like this cannot use unsloth, but when I run the code below it is still got the same error :
os.environ['CUDA_VISIBLE_DEVICES'] = "0"
Error:
ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True
and pass a custom device_mapto
from_pretrained. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.
Below is the information:
==((====))== Unsloth 2024.10.0: Fast Qwen2 patching. Transformers = 4.44.2. \\ /| GPU: GRID A100X-40C. Max memory: 39.996 GB. Platform = Linux. O^O/ \_/ \ Pytorch: 2.4.0. CUDA = 8.0. CUDA Toolkit = 12.1. \ / Bfloat16 = TRUE. FA [Xformers = 0.0.28.post1. FA2 = False] "-____-" Free Apache license: http://github.com/unslothai/unsloth