Open icecoldt369 opened 8 months ago
@icecoldt369 Oh wait does your local machine have a GPU? Maybe I'm misunderstanding your error - it seems like maybe ur PC doesn't have enough GPU VRAM or there's no GPU maybe?
@danielhanchen Hi! Sorry it might be a trivial matter, as I'm just not completely understanding the error entirely. Yes my local machine is Ubuntu with a Nvidia GeForce RTX 3060., I've got cuda version is 12.3 and ensured I flushed my GPU memory after training before loading my weights. I suspect the insufficient GPU VRAM might be the issue, if that is the case then
@icecoldt369 Did you load the model with load_in_4bit = True
@danielhanchen Yes, did not change the kwargs much. It works on Colab, not when running local. Then I tried adding the missing hf argument to load the model just to check, doesn't work to no surprise.
@icecoldt369 So so sorry this went under my radar!! Have you tried saving the merged model to 4bit and not 16bit? Ie model.push_to_hub_merged("name/model", tokenizer, save_method = "merged_4bit_forced")
This can save around 1GB of VRAM.
Also I'm assuming a RTX 3060 has 8GB right? Generall Mistral requires around 4.5GB of VRAM using the 4bit merged approach I described. If not, then it can take 5 to 6GB of VRAM, which might be causing the issues you're describing
there does seem to be a bug in that the error tells you to pass that argument to offload to cpu but then when passing the argument, it doesn't recognize it
@infuzu-yidisprei Oh offloading to CPU in HF doesn't work - I can make it work on Unsloth's side, but it'll require a bit of work :(
Hello, I am trying to download my pretrained model weights and use it for inference on a local notebook. Running the code on Google Colab has worked gracefully. However, I am encountering this error when attemptting to do the same on my local environment. This is the message:
File [~/miniconda3/envs/py10/lib/python3.10/site-packages/unsloth/models/loader.py:121](https://file+.vscode-resource.vscode-cdn.net/home/acleda/Downloads/~/miniconda3/envs/py10/lib/python3.10/site-packages/unsloth/models/loader.py:121), in FastLanguageModel.from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, token, device_map, rope_scaling, fix_tokenizer, use_gradient_checkpointing, *args, **kwargs) [115](https://file+.vscode-resource.vscode-cdn.net/home/acleda/Downloads/~/miniconda3/envs/py10/lib/python3.10/site-packages/unsloth/models/loader.py:115) raise NotImplementedError( [116](https://file+.vscode-resource.vscode-cdn.net/home/acleda/Downloads/~/miniconda3/envs/py10/lib/python3.10/site-packages/unsloth/models/loader.py:116) f"Unsloth: {model_name} not supported yet!\n"\ [117](https://file+.vscode-resource.vscode-cdn.net/home/acleda/Downloads/~/miniconda3/envs/py10/lib/python3.10/site-packages/unsloth/models/loader.py:117) "Make an issue to https://github.com/unslothai/unsloth!", [118](https://file+.vscode-resource.vscode-cdn.net/home/acleda/Downloads/~/miniconda3/envs/py10/lib/python3.10/site-packages/unsloth/models/loader.py:118) ) [119](https://file+.vscode-resource.vscode-cdn.net/home/acleda/Downloads/~/miniconda3/envs/py10/lib/python3.10/site-packages/unsloth/models/loader.py:119) pass ... in 32-bit, you need to set
load_in_8bit_fp32_cpu_offload=Trueand pass a custom
device_mapto
from_pretrained`. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...`
Once passing the specified argument to FastLanguageModel.from_pretrained, it does not recognise this argument. Please let know how to configure this correctly, thanks!