unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.5k stars 1.3k forks source link

32-bit CPU offloading argument error-parse #216

Open icecoldt369 opened 8 months ago

icecoldt369 commented 8 months ago

Hello, I am trying to download my pretrained model weights and use it for inference on a local notebook. Running the code on Google Colab has worked gracefully. However, I am encountering this error when attemptting to do the same on my local environment. This is the message:

File [~/miniconda3/envs/py10/lib/python3.10/site-packages/unsloth/models/loader.py:121](https://file+.vscode-resource.vscode-cdn.net/home/acleda/Downloads/~/miniconda3/envs/py10/lib/python3.10/site-packages/unsloth/models/loader.py:121), in FastLanguageModel.from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, token, device_map, rope_scaling, fix_tokenizer, use_gradient_checkpointing, *args, **kwargs) [115](https://file+.vscode-resource.vscode-cdn.net/home/acleda/Downloads/~/miniconda3/envs/py10/lib/python3.10/site-packages/unsloth/models/loader.py:115) raise NotImplementedError( [116](https://file+.vscode-resource.vscode-cdn.net/home/acleda/Downloads/~/miniconda3/envs/py10/lib/python3.10/site-packages/unsloth/models/loader.py:116) f"Unsloth: {model_name} not supported yet!\n"\ [117](https://file+.vscode-resource.vscode-cdn.net/home/acleda/Downloads/~/miniconda3/envs/py10/lib/python3.10/site-packages/unsloth/models/loader.py:117) "Make an issue to https://github.com/unslothai/unsloth!", [118](https://file+.vscode-resource.vscode-cdn.net/home/acleda/Downloads/~/miniconda3/envs/py10/lib/python3.10/site-packages/unsloth/models/loader.py:118) ) [119](https://file+.vscode-resource.vscode-cdn.net/home/acleda/Downloads/~/miniconda3/envs/py10/lib/python3.10/site-packages/unsloth/models/loader.py:119) pass ... in 32-bit, you need to setload_in_8bit_fp32_cpu_offload=Trueand pass a customdevice_mapto from_pretrained`. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.

Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...`

Once passing the specified argument to FastLanguageModel.from_pretrained, it does not recognise this argument. Please let know how to configure this correctly, thanks!

danielhanchen commented 8 months ago

@icecoldt369 Oh wait does your local machine have a GPU? Maybe I'm misunderstanding your error - it seems like maybe ur PC doesn't have enough GPU VRAM or there's no GPU maybe?

icecoldt369 commented 8 months ago

@danielhanchen Hi! Sorry it might be a trivial matter, as I'm just not completely understanding the error entirely. Yes my local machine is Ubuntu with a Nvidia GeForce RTX 3060., I've got cuda version is 12.3 and ensured I flushed my GPU memory after training before loading my weights. I suspect the insufficient GPU VRAM might be the issue, if that is the case then

danielhanchen commented 8 months ago

@icecoldt369 Did you load the model with load_in_4bit = True

icecoldt369 commented 8 months ago

@danielhanchen Yes, did not change the kwargs much. It works on Colab, not when running local. Then I tried adding the missing hf argument to load the model just to check, doesn't work to no surprise.

danielhanchen commented 8 months ago

@icecoldt369 So so sorry this went under my radar!! Have you tried saving the merged model to 4bit and not 16bit? Ie model.push_to_hub_merged("name/model", tokenizer, save_method = "merged_4bit_forced") This can save around 1GB of VRAM.

Also I'm assuming a RTX 3060 has 8GB right? Generall Mistral requires around 4.5GB of VRAM using the 4bit merged approach I described. If not, then it can take 5 to 6GB of VRAM, which might be causing the issues you're describing

infuzu-yidisprei commented 6 months ago

there does seem to be a bug in that the error tells you to pass that argument to offload to cpu but then when passing the argument, it doesn't recognize it

danielhanchen commented 6 months ago

@infuzu-yidisprei Oh offloading to CPU in HF doesn't work - I can make it work on Unsloth's side, but it'll require a bit of work :(