unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
16.91k stars 1.16k forks source link

.CANT LOAD LLAMA 3.1 70B due to ValueError: Some modules are dispatched on the CPU or the disk. #1139

Open MuhammadBilal848 opened 3 hours ago

MuhammadBilal848 commented 3 hours ago

I am trying to fine-tune llama 3.1 70b. It has 6 safetensors making a total of 39.52GB.

I have a total of 57.6GB of DISK in kaggle. But the code is raising a value error.

ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to setload_in_8bit_fp32_cpu_offload=Trueand pass a customdevice_maptofrom_pretrained. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.

Kaggle session matrix:

image

danielhanchen commented 3 hours ago

Oh Kaggle can only fit 22B Mistral Small - 70B sadly is way too large - you need at least a 48GB GPU card to finetune with Unsloth!

MuhammadBilal848 commented 2 hours ago

It's not loading gemma2 27B how would it load the 22B model?