Open CurtiusSimplus opened 1 month ago
if True: model.push_to_hub_gguf("name/MistralTextSummary", tokenizer, quantization_method="q8_0", token="valid_token that works for VLLM push to HF")
if True: model.save_pretrained_gguf("UserAI.gguf", tokenizer,)
Same result ... File in Pool ... IDK. I can save as VLLM 16, 4 bit and the LORA so it not a complete loss.
Oh that's not an issue - this is latest Unsloth right - which model is this?
That particular one is a mistral instruct 3 that had saved using the IDENTICAL code the week before IIRC. No problems.
But as of yesterday ...
It is every model I tried ... they won't push to HF but this particular was a Mistral instruct 3 ... It has been an off and on issue.
They will push to 16 4 bit and Lora formats as 'bin' but not GGUF. IDK.
And it won't save to local using this code -- sorry about the errors bad eye sight.
if True: model.push_to_hub_gguf("User Name here which is correct", tokenizer, quantization_method="q8_0", token="hf_token is correct as it works to push to LLMV as 16 or 4 bit with ease.")
So the upside if that GGUF save fails. It will save as 16 bit and 4 bit llm using SAME credentials.
This is the unslot code being used ... to start the script ...
%%capture !pip install unsloth
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" !pip uninstall transformers -y && pip install --upgrade --no-cache-dir "git+https://github.com/huggingface/transformers.git"
Did it again .. identical error ... FILE IN POOL ... no idea.
Ok hmm so Mistral Instruct v3? Hmm
Ok hmm so Mistral Instruct v3? Hmm
AMONG others ... Mistral SMALL, mISTRAL INSTRUCT ... nemo ... all of them WILL NOT PUSH THE HF as GGUF ... is not model dependent!
UDING THIS CODE WON'T WORK FOR ANY MODEL I TRY TO SAVE AS OF YESTERDAY?
**# Save to 8bit Q8_0
if True: model.push_to_hub_gguf("Correct Name and Model provided", tokenizer, quantization_method="q8_0", token="hf_is correct since works to push as 16 bit BUT NOT as GGUF")**
SAME ERROR ... Unsloth: Merging 4bit and LoRA weights to 16bit... Unsloth: Will use up to 23.51 out of 50.99 RAM for saving. 100%|██████████| 32/32 [00:15<00:00, 2.11it/s] Unsloth: Saving tokenizer... Done. Unsloth: Saving model... This might take 5 minutes for Llama-7b... Done. ==((====))== Unsloth: Conversion from QLoRA to GGUF information \ /| [0] Installing llama.cpp will take 3 minutes. O^O/ _/ \ [1] Converting HF to GGUF 16bits will take 3 minutes. \ / [2] Converting GGUF 16bits to ['q8_0'] will take 10 minutes each. "-____-" In total, you will have to wait at least 16 minutes.
TypeError Traceback (most recent call last)
4 frames /usr/local/lib/python3.10/dist-packages/google/protobuf/descriptor.py in new(cls, name, package, options, serialized_options, serialized_pb, dependencies, public_dependencies, syntax, pool, create_key) 1022 name=None, 1023 full_name=None, -> 1024 index=None, 1025 methods=None, 1026 options=None,
TypeError: Couldn't build proto file into descriptor pool! Invalid proto descriptor for file "sentencepiece_model.proto": sentencepiece_model.proto: A file with this name is already in the pool.
The ERROR is GGUF ...
16 bit gives this:
Unsloth: You are pushing to hub, but you passed your HF username = AlSamCur123. We shall truncate AlSamCur123/MistralContinuedFine to MistralContinuedFine Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab. Unsloth: Will remove a cached repo with size 4.1G Unsloth: Merging 4bit and LoRA weights to 16bit... Unsloth: Will use up to 20.71 out of 50.99 RAM for saving. 47%|████▋ | 15/32 [00:01<00:01, 15.52it/s]We will save to Disk and not RAM now. 100%|██████████| 32/32 [00:21<00:00, 1.50it/s] Unsloth: Saving tokenizer... 100% 1/1 [00:01<00:00, 1.40s/it] tokenizer.model: 100% 587k/587k [00:01<00:00, 495kB/s] Done. Unsloth: Saving model... This might take 5 minutes for Llama-7b... README.md: 100% 610/610 [00:00<00:00, 43.8kB/s] 100% 3/3 [02:08<00:00, 42.81s/it] model-00001-of-00003.safetensors: 4.96G/? [00:45<00:00, 539MB/s] model-00003-of-00003.safetensors: 4.56G/? [00:39<00:00, 761MB/s] model-00002-of-00003.safetensors: 5.01G/? [00:43<00:00, 666MB/s] Done. Saved merged model to https://huggingface.co/HF model Unsloth: Saving LoRA adapters. Please wait... README.md: 100% 610/610 [00:00<00:00, 52.0kB/s] config.json: 100% 1.21k/1.21k [00:00<00:00, 114kB/s] 100% 1/1 [00:02<00:00, 2.38s/it] adapter_model.safetensors: 176M/? [00:02<00:00, 107MB/s] 100% 1/1 [00:01<00:00, 1.22s/it] tokenizer.model: 100% 587k/587k [00:00<00:00, 596kB/s] Saved lora model to https://huggingface.co/HF/Model
So if True: model.push_to_hub_merged = works uses same token as other that errors.
but if True: model.push_to_hub_gguf = does not ... gives error ... TOKEN IS CORRECT ...
as of yesterday ...
It is every model I tried
as of yesterday ...
It is every model I tried instruct 3, small and nemo ...It only saves the README and ,getattributes files NO GGUF ... none. just errors out ... Wont save them locally to colab either.
Just won't save them. Why train a model I can't save or use as gguf? This has cost me 10 dollars over the last 2 days ... in my poor world that is a LOT OF MONEY ...
THis is the code I am using ... # Save to 8bit Q8_0
if True: model.push_to_hub_gguf("", tokenizer, quantization_method="q80", token="hf")
if True: model.push_to_hub_gguf("", tokenizer, quantizationmethod = "f16", token = "hf")
if True: model.save_pretrained_gguf("UserAI.gguf", tokenizer, quantization_method = "q4_0") if True: model.push_to_hub_gguf("", tokenizer, quantization_method = "q40", token = "hf") if True: model.push_to_hub_gguf("", tokenizer, quantization_method = "q6k", token = "hf") if True: model.push_to_hub_gguf("", tokenizer, quantization_method = "q5_km", token = "hf")
worked for MONTHS... NO ISSUE until a day or so ago then this FILE IN POOL nonsense started.
And right there on CUE another error ... this is getting expensive ...
Unsloth: Merging 4bit and LoRA weights to 16bit... Unsloth: Will use up to 23.39 out of 50.99 RAM for saving. 100%|██████████| 32/32 [00:22<00:00, 1.44it/s] Unsloth: Saving tokenizer... Done. Unsloth: Saving model... This might take 5 minutes for Llama-7b... Done. ==((====))== Unsloth: Conversion from QLoRA to GGUF information \ /| [0] Installing llama.cpp will take 3 minutes. O^O/ _/ \ [1] Converting HF to GGUF 16bits will take 3 minutes. \ / [2] Converting GGUF 16bits to ['q8_0'] will take 10 minutes each. "-____-" In total, you will have to wait at least 16 minutes.
TypeError Traceback (most recent call last)
4 frames /usr/local/lib/python3.10/dist-packages/google/protobuf/descriptor.py in new(cls, name, package, options, serialized_options, serialized_pb, dependencies, public_dependencies, syntax, pool, create_key) 1022 name=None, 1023 full_name=None, -> 1024 index=None, 1025 methods=None, 1026 options=None,
TypeError: Couldn't build proto file into descriptor pool! Invalid proto descriptor for file "sentencepiece_model.proto": sentencepiece_model.proto: A file with this name is already in the pool.
So can you give me a script to import my merged_16bit model and Save as GGUF? It that possible? Cuz otherwise ... this is all money burnt.
Hours long trains that WON'T save are demoralizing ... that THREE in two days ...
@CurtiusSimplus Have you tried manually saving it to GGUF? https://github.com/unslothai/unsloth/wiki#manually-saving-to-gguf
Another temp solution is to only save the LoRA adapters via model.save_pretrained
or model.save_pretrained_merged(..., save_method = "lora")
then reload them in a new environment after shutting the old one down, then reloading them with Unsloth then saving to GGUF again
I'm assuming you're using the same instance over and over again? Something might have broke
Also have you considered joining our Discord channel and asking there? You will get more live and faster responses
Trying to load FLOAT 16 bit ... will try to save to GGUF directly ...
Model will load : model, tokenizer = FastLanguageModel.from_pretrained( model_name = "my model loads and trains but won't save as gguf", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B,unsloth/mistral-7b-instruct-v0.3 max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit,
)
model does load ... Next will test inference .... [' Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nContinue the fibonnaci sequence.\n\n### Input:\n1, 1, 2, 3, 5, 8\n\n### Response:\n13\n\nThe next number in the Fibonacci sequence is found by adding the two previous numbers. In this case, the last two numbers are 8 and 5, so the next number is 8 + 5 = 13.']
Next will text for save as GGUF see if it works.
identical error ... TypeError: Couldn't build proto file into descriptor pool! Invalid proto descriptor for file "sentencepiece_model.proto": sentencepiece_model.proto: A file with this name is already in the pool.
If I save the Qlora I can use another vanilla script ... Thanks!
You can try converting the merged_model in your HF repo to GGUF manually with this space.
You can try converting the merged_model in your HF repo to GGUF manually with this space.
Cool man thanks!
Well that failed too ...
It just gives an error .... that says 'error'.
@CurtiusSimplus Have you tried manually saving it to GGUF? https://github.com/unslothai/unsloth/wiki#manually-saving-to-gguf
I have tried this ... I can't get it to work. Sorry above my level. I don't know what inputs and outputs it wants.
@Erland366 Mistral Instruct v3 seems to be the main blocker for exporting to vLLM
@Erland366 Mistral Instruct v3 seems to be the main blocker for exporting to vLLM
So Mistral 3 says using legacy tokenizer?
Since I can save to vllm 16 and I did figure out how to use the LLAMA.ccp (not in colab ... Learning curves) but still NO mistral 3 instruct train I do will save to GGUF directly.
Nemo will.
Mistral SMALL did last I checked ....
ALL the same script.
Just with HF/model changed.
Mistral Fails as you say ... Most others don't.
Does that legacy tokenizer warning have to do with it?
Again: I can save to LLM 16 bit and have figured out how to manual save -- so my trains are NOT LOST ... Just won't push to hugging face with certain models.
Will check to see if they are all 'legacy' models. That might be the issue. Since to my knowledge that is a NEW WARNING as of maybe a week ago.
Unsloth: Merging 4bit and LoRA weights to 16bit... Unsloth: Will use up to 23.73 out of 50.99 RAM for saving. 100%|██████████| 32/32 [00:19<00:00, 1.67it/s] Unsloth: Saving tokenizer... Done. Unsloth: Saving model... This might take 5 minutes for Llama-7b... Done. ==((====))== Unsloth: Conversion from QLoRA to GGUF information \ /| [0] Installing llama.cpp will take 3 minutes. O^O/ _/ \ [1] Converting HF to GGUF 16bits will take 3 minutes. \ / [2] Converting GGUF 16bits to ['q8_0'] will take 10 minutes each. "-____-" In total, you will have to wait at least 16 minutes.
Unsloth: [0] Installing llama.cpp. This will take 3 minutes... Unsloth: [1] Converting model at Into q8_0 GGUF format. The output location will be unsloth.Q8_0.gguf This will take 3 minutes...
TypeError Traceback (most recent call last) in <cell line: 4>()
2 # Save to 8bit Q8_0
3 #if True: model.save_pretrained_gguf("UserAI.gguf", tokenizer,)
----> 4 if True: model.push_to_hub_gguf("mINE", tokenizer, quantization_method="q8_0", token="EDITED")
5
6 # Save to 16bit GGUF
4 frames /usr/local/lib/python3.10/dist-packages/google/protobuf/descriptor.py in new(cls, name, package, options, serialized_options, serialized_pb, dependencies, public_dependencies, syntax, pool, create_key) 1022 name=None, 1023 full_name=None, -> 1024 index=None, 1025 methods=None, 1026 options=None,
TypeError: Couldn't build proto file into descriptor pool! Invalid proto descriptor for file "sentencepiece_model.proto": sentencepiece_model.proto: A file with this name is already in the pool.