CurtiusSimplus commented 1 month ago

Unsloth: Merging 4bit and LoRA weights to 16bit... Unsloth: Will use up to 23.73 out of 50.99 RAM for saving. 100%|██████████| 32/32 [00:19<00:00, 1.67it/s] Unsloth: Saving tokenizer... Done. Unsloth: Saving model... This might take 5 minutes for Llama-7b... Done. ==((====))== Unsloth: Conversion from QLoRA to GGUF information \ /| [0] Installing llama.cpp will take 3 minutes. O^O/ _/ \ [1] Converting HF to GGUF 16bits will take 3 minutes. \ / [2] Converting GGUF 16bits to ['q8_0'] will take 10 minutes each. "-____-" In total, you will have to wait at least 16 minutes.

Unsloth: [0] Installing llama.cpp. This will take 3 minutes... Unsloth: [1] Converting model at Into q8_0 GGUF format. The output location will be unsloth.Q8_0.gguf This will take 3 minutes...

TypeError Traceback (most recent call last) in <cell line: 4>() 2 # Save to 8bit Q8_0 3 #if True: model.save_pretrained_gguf("UserAI.gguf", tokenizer,) ----> 4 if True: model.push_to_hub_gguf("mINE", tokenizer, quantization_method="q8_0", token="EDITED") 5 6 # Save to 16bit GGUF

4 frames /usr/local/lib/python3.10/dist-packages/google/protobuf/descriptor.py in new(cls, name, package, options, serialized_options, serialized_pb, dependencies, public_dependencies, syntax, pool, create_key) 1022 name=None, 1023 full_name=None, -> 1024 index=None, 1025 methods=None, 1026 options=None,

TypeError: Couldn't build proto file into descriptor pool! Invalid proto descriptor for file "sentencepiece_model.proto": sentencepiece_model.proto: A file with this name is already in the pool.

CurtiusSimplus commented 4 weeks ago

if True: model.push_to_hub_gguf("name/MistralTextSummary", tokenizer, quantization_method="q8_0", token="valid_token that works for VLLM push to HF")

if True: model.save_pretrained_gguf("UserAI.gguf", tokenizer,)

Same result ... File in Pool ... IDK. I can save as VLLM 16, 4 bit and the LORA so it not a complete loss.

danielhanchen commented 4 weeks ago

Oh that's not an issue - this is latest Unsloth right - which model is this?

CurtiusSimplus commented 4 weeks ago

That particular one is a mistral instruct 3 that had saved using the IDENTICAL code the week before IIRC. No problems.

But as of yesterday ...

It is every model I tried ... they won't push to HF but this particular was a Mistral instruct 3 ... It has been an off and on issue.

They will push to 16 4 bit and Lora formats as 'bin' but not GGUF. IDK.

And it won't save to local using this code -- sorry about the errors bad eye sight.

Save to 8bit Q8_0

if True: model.save_pretrained_gguf("X.gguf", tokenizer,)

if True: model.push_to_hub_gguf("User Name here which is correct", tokenizer, quantization_method="q8_0", token="hf_token is correct as it works to push to LLMV as 16 or 4 bit with ease.")

CurtiusSimplus commented 4 weeks ago

So the upside if that GGUF save fails. It will save as 16 bit and 4 bit llm using SAME credentials.

CurtiusSimplus commented 4 weeks ago

This is the unslot code being used ... to start the script ...

%%capture !pip install unsloth

Also get the latest nightly Unsloth!

!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" !pip uninstall transformers -y && pip install --upgrade --no-cache-dir "git+https://github.com/huggingface/transformers.git"

CurtiusSimplus commented 4 weeks ago

Did it again .. identical error ... FILE IN POOL ... no idea.

danielhanchen commented 4 weeks ago

Ok hmm so Mistral Instruct v3? Hmm

CurtiusSimplus commented 4 weeks ago

Ok hmm so Mistral Instruct v3? Hmm

AMONG others ... Mistral SMALL, mISTRAL INSTRUCT ... nemo ... all of them WILL NOT PUSH THE HF as GGUF ... is not model dependent!

UDING THIS CODE WON'T WORK FOR ANY MODEL I TRY TO SAVE AS OF YESTERDAY?

**# Save to 8bit Q8_0

if True: model.save_pretrained_gguf("UserAI.gguf", tokenizer,)

if True: model.push_to_hub_gguf("Correct Name and Model provided", tokenizer, quantization_method="q8_0", token="hf_is correct since works to push as 16 bit BUT NOT as GGUF")**

SAME ERROR ... Unsloth: Merging 4bit and LoRA weights to 16bit... Unsloth: Will use up to 23.51 out of 50.99 RAM for saving. 100%|██████████| 32/32 [00:15<00:00, 2.11it/s] Unsloth: Saving tokenizer... Done. Unsloth: Saving model... This might take 5 minutes for Llama-7b... Done. ==((====))== Unsloth: Conversion from QLoRA to GGUF information \ /| [0] Installing llama.cpp will take 3 minutes. O^O/ _/ \ [1] Converting HF to GGUF 16bits will take 3 minutes. \ / [2] Converting GGUF 16bits to ['q8_0'] will take 10 minutes each. "-____-" In total, you will have to wait at least 16 minutes.

Unsloth: [0] Installing llama.cpp. This will take 3 minutes... Unsloth: [1] Converting model at AlSamCur123/MistralTextSummary into q8_0 GGUF format. The output location will be /content/nope/unsloth.Q8_0.gguf This will take 3 minutes...

TypeError Traceback (most recent call last) in <cell line: 3>() 1 # Save to 8bit Q8_0 2 #if True: model.save_pretrained_gguf("UserAI.gguf", tokenizer,) ----> 3 if True: model.push_to_hub_gguf("nope", tokenizer, quantization_method="q8_0", token="YOU DON'T NEED MY TOKEN) 4 5 # Save to 16bit GGUF

4 frames /usr/local/lib/python3.10/dist-packages/google/protobuf/descriptor.py in new(cls, name, package, options, serialized_options, serialized_pb, dependencies, public_dependencies, syntax, pool, create_key) 1022 name=None, 1023 full_name=None, -> 1024 index=None, 1025 methods=None, 1026 options=None,

TypeError: Couldn't build proto file into descriptor pool! Invalid proto descriptor for file "sentencepiece_model.proto": sentencepiece_model.proto: A file with this name is already in the pool.

The ERROR is GGUF ...

16 bit gives this:

Unsloth: You are pushing to hub, but you passed your HF username = AlSamCur123. We shall truncate AlSamCur123/MistralContinuedFine to MistralContinuedFine Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab. Unsloth: Will remove a cached repo with size 4.1G Unsloth: Merging 4bit and LoRA weights to 16bit... Unsloth: Will use up to 20.71 out of 50.99 RAM for saving. 47%|████▋ | 15/32 [00:01<00:01, 15.52it/s]We will save to Disk and not RAM now. 100%|██████████| 32/32 [00:21<00:00, 1.50it/s] Unsloth: Saving tokenizer... 100% 1/1 [00:01<00:00, 1.40s/it] tokenizer.model: 100% 587k/587k [00:01<00:00, 495kB/s] Done. Unsloth: Saving model... This might take 5 minutes for Llama-7b... README.md: 100% 610/610 [00:00<00:00, 43.8kB/s] 100% 3/3 [02:08<00:00, 42.81s/it] model-00001-of-00003.safetensors: 4.96G/? [00:45<00:00, 539MB/s] model-00003-of-00003.safetensors: 4.56G/? [00:39<00:00, 761MB/s] model-00002-of-00003.safetensors: 5.01G/? [00:43<00:00, 666MB/s] Done. Saved merged model to https://huggingface.co/HF model Unsloth: Saving LoRA adapters. Please wait... README.md: 100% 610/610 [00:00<00:00, 52.0kB/s] config.json: 100% 1.21k/1.21k [00:00<00:00, 114kB/s] 100% 1/1 [00:02<00:00, 2.38s/it] adapter_model.safetensors: 176M/? [00:02<00:00, 107MB/s] 100% 1/1 [00:01<00:00, 1.22s/it] tokenizer.model: 100% 587k/587k [00:00<00:00, 596kB/s] Saved lora model to https://huggingface.co/HF/Model

So if True: model.push_to_hub_merged = works uses same token as other that errors.

but if True: model.push_to_hub_gguf = does not ... gives error ... TOKEN IS CORRECT ...

CurtiusSimplus commented 4 weeks ago

as of yesterday ...

It is every model I tried

as of yesterday ...

It is every model I tried instruct 3, small and nemo ...It only saves the README and ,getattributes files NO GGUF ... none. just errors out ... Wont save them locally to colab either.

Just won't save them. Why train a model I can't save or use as gguf? This has cost me 10 dollars over the last 2 days ... in my poor world that is a LOT OF MONEY ...

THis is the code I am using ... # Save to 8bit Q8_0

if True: model.save_pretrained_gguf("UserAI.gguf", tokenizer,)

if True: model.push_to_hub_gguf("", tokenizer, quantization_method="q80", token="hf")

Save to 16bit GGUF

if True: model.save_pretrained_gguf("UserAI.gguf", tokenizer, quantization_method = "f16")

if True: model.push_to_hub_gguf("", tokenizer, quantizationmethod = "f16", token = "hf")

Save to q4_0 GGUF

if True: model.save_pretrained_gguf("UserAI.gguf", tokenizer, quantization_method = "q4_0") if True: model.push_to_hub_gguf("", tokenizer, quantization_method = "q40", token = "hf") if True: model.push_to_hub_gguf("", tokenizer, quantization_method = "q6k", token = "hf") if True: model.push_to_hub_gguf("", tokenizer, quantization_method = "q5_km", token = "hf")

worked for MONTHS... NO ISSUE until a day or so ago then this FILE IN POOL nonsense started.

And right there on CUE another error ... this is getting expensive ...

Unsloth: Merging 4bit and LoRA weights to 16bit... Unsloth: Will use up to 23.39 out of 50.99 RAM for saving. 100%|██████████| 32/32 [00:22<00:00, 1.44it/s] Unsloth: Saving tokenizer... Done. Unsloth: Saving model... This might take 5 minutes for Llama-7b... Done. ==((====))== Unsloth: Conversion from QLoRA to GGUF information \ /| [0] Installing llama.cpp will take 3 minutes. O^O/ _/ \ [1] Converting HF to GGUF 16bits will take 3 minutes. \ / [2] Converting GGUF 16bits to ['q8_0'] will take 10 minutes each. "-____-" In total, you will have to wait at least 16 minutes.

Unsloth: [0] Installing llama.cpp. This will take 3 minutes... Unsloth: [1] Converting model at AlSamCur123/MistralTextSummary into q8_0 GGUF format. The output location will be /content/AlSamCur123/MistralTextSummary/unsloth.Q8_0.gguf This will take 3 minutes...

TypeError Traceback (most recent call last) in <cell line: 3>() 1 # Save to 8bit Q8_0 2 #if True: model.save_pretrained_gguf("f", tokenizer,) ----> 3 if True: model.push_to_hub_gguf("", tokenizer, quantization_method="q80", token="hf) 4 5 # Save to 16bit GGUF

4 frames /usr/local/lib/python3.10/dist-packages/google/protobuf/descriptor.py in new(cls, name, package, options, serialized_options, serialized_pb, dependencies, public_dependencies, syntax, pool, create_key) 1022 name=None, 1023 full_name=None, -> 1024 index=None, 1025 methods=None, 1026 options=None,

TypeError: Couldn't build proto file into descriptor pool! Invalid proto descriptor for file "sentencepiece_model.proto": sentencepiece_model.proto: A file with this name is already in the pool.

CurtiusSimplus commented 4 weeks ago

So can you give me a script to import my merged_16bit model and Save as GGUF? It that possible? Cuz otherwise ... this is all money burnt.

CurtiusSimplus commented 4 weeks ago

Hours long trains that WON'T save are demoralizing ... that THREE in two days ...

danielhanchen commented 4 weeks ago

@CurtiusSimplus Have you tried manually saving it to GGUF? https://github.com/unslothai/unsloth/wiki#manually-saving-to-gguf

danielhanchen commented 4 weeks ago

Another temp solution is to only save the LoRA adapters via model.save_pretrained or model.save_pretrained_merged(..., save_method = "lora") then reload them in a new environment after shutting the old one down, then reloading them with Unsloth then saving to GGUF again

I'm assuming you're using the same instance over and over again? Something might have broke

danielhanchen commented 4 weeks ago

Also have you considered joining our Discord channel and asking there? You will get more live and faster responses

CurtiusSimplus commented 4 weeks ago

Trying to load FLOAT 16 bit ... will try to save to GGUF directly ...

Model will load : model, tokenizer = FastLanguageModel.from_pretrained( model_name = "my model loads and trains but won't save as gguf", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B,unsloth/mistral-7b-instruct-v0.3 max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit,

token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf

)

CurtiusSimplus commented 4 weeks ago

model does load ... Next will test inference .... [' Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nContinue the fibonnaci sequence.\n\n### Input:\n1, 1, 2, 3, 5, 8\n\n### Response:\n13\n\nThe next number in the Fibonacci sequence is found by adding the two previous numbers. In this case, the last two numbers are 8 and 5, so the next number is 8 + 5 = 13.']

Next will text for save as GGUF see if it works.

identical error ... TypeError: Couldn't build proto file into descriptor pool! Invalid proto descriptor for file "sentencepiece_model.proto": sentencepiece_model.proto: A file with this name is already in the pool.

CurtiusSimplus commented 4 weeks ago

If I save the Qlora I can use another vanilla script ... Thanks!

dendarrion commented 4 weeks ago

You can try converting the merged_model in your HF repo to GGUF manually with this space.

CurtiusSimplus commented 2 weeks ago

You can try converting the merged_model in your HF repo to GGUF manually with this space.

Cool man thanks!

CurtiusSimplus commented 2 weeks ago

Well that failed too ...

CurtiusSimplus commented 2 weeks ago

It just gives an error .... that says 'error'.

CurtiusSimplus commented 2 weeks ago

@CurtiusSimplus Have you tried manually saving it to GGUF? https://github.com/unslothai/unsloth/wiki#manually-saving-to-gguf

I have tried this ... I can't get it to work. Sorry above my level. I don't know what inputs and outputs it wants.

danielhanchen commented 2 weeks ago

@Erland366 Mistral Instruct v3 seems to be the main blocker for exporting to vLLM

CurtiusSimplus commented 2 weeks ago

@Erland366 Mistral Instruct v3 seems to be the main blocker for exporting to vLLM

So Mistral 3 says using legacy tokenizer?

Since I can save to vllm 16 and I did figure out how to use the LLAMA.ccp (not in colab ... Learning curves) but still NO mistral 3 instruct train I do will save to GGUF directly.

Nemo will.

Mistral SMALL did last I checked ....

ALL the same script.

Just with HF/model changed.

Mistral Fails as you say ... Most others don't.

Does that legacy tokenizer warning have to do with it?

CurtiusSimplus commented 2 weeks ago

Again: I can save to LLM 16 bit and have figured out how to manual save -- so my trains are NOT LOST ... Just won't push to hugging face with certain models.

Will check to see if they are all 'legacy' models. That might be the issue. Since to my knowledge that is a NEW WARNING as of maybe a week ago.

unslothai / unsloth

Mistral Instruct v3 `sentencepiece_model.proto` error #1198

Unsloth: [0] Installing llama.cpp. This will take 3 minutes... Unsloth: [1] Converting model at Into q8_0 GGUF format. The output location will be unsloth.Q8_0.gguf This will take 3 minutes...

Save to 8bit Q8_0

if True: model.save_pretrained_gguf("X.gguf", tokenizer,)

Also get the latest nightly Unsloth!

if True: model.save_pretrained_gguf("UserAI.gguf", tokenizer,)

Unsloth: [0] Installing llama.cpp. This will take 3 minutes... Unsloth: [1] Converting model at AlSamCur123/MistralTextSummary into q8_0 GGUF format. The output location will be /content/nope/unsloth.Q8_0.gguf This will take 3 minutes...

if True: model.save_pretrained_gguf("UserAI.gguf", tokenizer,)

Save to 16bit GGUF

if True: model.save_pretrained_gguf("UserAI.gguf", tokenizer, quantization_method = "f16")

Save to q4_0 GGUF

Unsloth: [0] Installing llama.cpp. This will take 3 minutes... Unsloth: [1] Converting model at AlSamCur123/MistralTextSummary into q8_0 GGUF format. The output location will be /content/AlSamCur123/MistralTextSummary/unsloth.Q8_0.gguf This will take 3 minutes...

token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf