unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.57k stars 1.3k forks source link

Issue saving mistral-7b-instruct-v0.3-bnb-4bit to GGUF #1197

Open Linguiniotta opened 1 month ago

Linguiniotta commented 1 month ago

Issue during saving unsloth/mistral-7b-instruct-v0.3-bnb-4bit after training/saving, both in Kaggle and gguf-my-repo

I have tried converting uploaded LoRA and the merged version with gguf-my-repo and the push_to_hub_gguf built-in method.

# Saves successfully
tokenizer.push_to_hub(LORA_ONLY, private=True)
peft_model.push_to_hub(
    LORA_ONLY,
    tokenizer,
    save_method='lora',
)

# Saves successfully
tokenizer.push_to_hub(MERGED_MODEL, private=True)
peft_model.push_to_hub_merged(
    MERGED_MODEL,
    tokenizer,
)

%cd /tmp
tokenizer.push_to_hub(GGUF_MODEL, private=True)
# Error, Space issue, but there is also mention of vocab errors
# that I think is similar to errors encountered in gguf-my-repo
peft_model.push_to_hub_gguf(
        GGUF_MODEL, tokenizer,
        quantization_method = [
            'q4_k_m',
            'q5_k_m',
            'q6_k',
        ])
push_to_hub_gguf ```python /tmp No files have been modified since last commit. Skipping to prevent empty commit. Unsloth: Merging 4bit and LoRA weights to 16bit... Unsloth: Will use up to 16.08 out of 31.36 RAM for saving. 100%|██████████| 32/32 [00:19<00:00, 1.62it/s] Unsloth: Saving tokenizer... Done. Unsloth: Saving model... This might take 5 minutes for Llama-7b... Unsloth: Saving My_HF_Username/MistralInstruct-v0.3-FT-GGUF/pytorch_model-00001-of-00003.bin... Unsloth: Saving My_HF_Username/MistralInstruct-v0.3-FT-GGUF/pytorch_model-00002-of-00003.bin... Unsloth: Saving My_HF_Username/MistralInstruct-v0.3-FT-GGUF/pytorch_model-00003-of-00003.bin... Done. ==((====))== Unsloth: Conversion from QLoRA to GGUF information \\ /| [0] Installing llama.cpp will take 3 minutes. O^O/ \_/ \ [1] Converting HF to GGUF 16bits will take 3 minutes. \ / [2] Converting GGUF 16bits to ['q4_k_m', 'q6_k'] will take 10 minutes each. "-____-" In total, you will have to wait at least 16 minutes. Unsloth: [0] Installing llama.cpp. This will take 3 minutes... Unsloth: [1] Converting model at My_HF_Username/MistralInstruct-v0.3-FT-GGUF into f16 GGUF format. The output location will be /tmp/My_HF_Username/MistralInstruct-v0.3-FT-GGUF/unsloth.F16.gguf This will take 3 minutes... Traceback (most recent call last): File "/tmp/llama.cpp/convert_hf_to_gguf.py", line 1524, in set_vocab self._set_vocab_sentencepiece() File "/tmp/llama.cpp/convert_hf_to_gguf.py", line 747, in _set_vocab_sentencepiece tokens, scores, toktypes = self._create_vocab_sentencepiece() File "/tmp/llama.cpp/convert_hf_to_gguf.py", line 764, in _create_vocab_sentencepiece raise FileNotFoundError(f"File not found: {tokenizer_path}") FileNotFoundError: File not found: My_HF_Username/MistralInstruct-v0.3-FT-GGUF/tokenizer.model During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/tmp/llama.cpp/convert_hf_to_gguf.py", line 1527, in set_vocab self._set_vocab_llama_hf() File "/tmp/llama.cpp/convert_hf_to_gguf.py", line 839, in _set_vocab_llama_hf vocab = gguf.LlamaHfVocab(self.dir_model) File "/tmp/llama.cpp/gguf-py/gguf/vocab.py", line 402, in __init__ self.tokenizer = AutoTokenizer.from_pretrained( File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 920, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2213, in from_pretrained return cls._from_pretrained( File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2447, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 157, in __init__ super().__init__( File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 132, in __init__ slow_tokenizer = self.slow_tokenizer_class(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 169, in __init__ self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False)) File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 199, in get_spm_processor with open(self.vocab_file, "rb") as f: TypeError: expected str, bytes or os.PathLike object, not NoneType During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/tmp/llama.cpp/convert_hf_to_gguf.py", line 4436, in main() File "/tmp/llama.cpp/convert_hf_to_gguf.py", line 4430, in main model_instance.write() File "/tmp/llama.cpp/convert_hf_to_gguf.py", line 434, in write self.prepare_metadata(vocab_only=False) File "/tmp/llama.cpp/convert_hf_to_gguf.py", line 427, in prepare_metadata self.set_vocab() File "/tmp/llama.cpp/convert_hf_to_gguf.py", line 1530, in set_vocab self._set_vocab_gpt2() File "/tmp/llama.cpp/convert_hf_to_gguf.py", line 683, in _set_vocab_gpt2 tokens, toktypes, tokpre = self.get_vocab_base() File "/tmp/llama.cpp/convert_hf_to_gguf.py", line 511, in get_vocab_base tokenizer = AutoTokenizer.from_pretrained(self.dir_model) File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 920, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2213, in from_pretrained return cls._from_pretrained( File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2447, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 157, in __init__ super().__init__( File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 132, in __init__ slow_tokenizer = self.slow_tokenizer_class(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 169, in __init__ self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False)) File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 199, in get_spm_processor with open(self.vocab_file, "rb") as f: TypeError: expected str, bytes or os.PathLike object, not NoneType Something went wrong saving to GGUF… Unsloth: Quantization failed for /tmp/My_HF_Username/MistralInstruct-v0.3-FT-GGUF/unsloth.F16.gguf You are in a Kaggle environment, which might be the reason this is failing. Kaggle only provides 20GB of disk space. Merging to 16bit for 7b models use 16GB of space. This means using `model.{save_pretrained/push_to_hub}_merged` works, but `model.{save_pretrained/push_to_hub}_gguf will use too much disk space. I suggest you to save the 16bit model first, then use manual llama.cpp conversion. ```
With manually duplicated `config.json` from `adapter-config.json` based on this issue #421 GGUF-my-repo for LoRA Error: Error converting to fp16: b'INFO:hf-to-gguf:Loading model: MistralInstruct-v0.3-FT\nTraceback (most recent call last):\n File "/home/user/app/llama.cpp/convert_hf_to_gguf.py", line 4433, in \n main()\n File "/home/user/app/llama.cpp/convert_hf_to_gguf.py", line 4405, in main\n model_architecture = hparams["architectures"][0]\nKeyError: 'architectures'\n'
GGUF-my-repo for Merged Error: Error converting to fp16: b'INFO:hf-to-gguf:Loading model: MistralInstruct-v0.3-FT-Merged\nINFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only\nINFO:hf-to-gguf:Exporting model...\nINFO:hf-to-gguf:gguf: loading model weight map from 'pytorch_model.bin.index.json'\nINFO:hf-to-gguf:gguf: loading model part 'pytorch_model-00001-of-00003.bin'\nINFO:hf-to-gguf:token_embd.weight, torch.float16 --> F16, shape = {4096, 32768}\nINFO:hf-to-gguf:blk.0.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.0.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.0.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.0.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.0.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.0.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.0.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.1.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.1.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.1.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.1.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.1.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.1.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.1.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.2.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.2.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.2.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.2.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.2.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.2.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.2.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.3.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.3.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.3.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.3.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.3.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.3.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.3.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.3.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.4.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.4.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.4.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.4.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.4.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.4.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.4.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.4.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.5.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.5.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.5.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.5.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.5.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.5.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.5.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.5.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.6.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.6.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.6.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.6.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.6.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.6.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.6.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.6.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.7.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.7.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.7.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.7.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.7.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.7.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.7.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.7.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.8.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.8.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.8.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.8.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.8.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.8.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.8.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.8.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.8.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.9.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.9.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.9.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.9.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.9.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.9.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.9.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.9.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.9.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.10.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.10.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.10.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.10.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.10.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.10.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:gguf: loading model part 'pytorch_model-00002-of-00003.bin'\nINFO:hf-to-gguf:blk.10.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.10.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.11.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.11.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.11.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.11.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.11.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.11.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.11.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.11.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.12.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.12.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.12.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.12.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.12.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.12.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.12.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.12.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.13.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.13.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.13.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.13.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.13.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.13.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.13.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.13.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.14.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.14.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.14.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.14.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.14.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.14.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.14.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.14.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.14.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.15.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.15.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.15.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.15.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.15.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.15.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.15.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.15.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.15.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.16.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.16.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.16.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.16.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.16.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.16.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.16.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.16.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.16.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.17.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.17.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.17.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.17.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.17.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.17.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.17.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.17.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.17.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.18.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.18.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.18.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.18.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.18.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.18.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.18.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.18.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.18.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.19.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.19.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.19.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.19.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.19.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.19.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.19.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.19.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.19.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.20.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.20.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.20.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.20.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.20.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.20.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.20.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.20.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.20.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.21.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.21.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.21.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.21.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.21.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.21.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.21.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.21.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.21.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.22.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.22.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.22.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.22.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:gguf: loading model part 'pytorch_model-00003-of-00003.bin'\nINFO:hf-to-gguf:blk.22.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.22.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.22.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.22.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.22.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.23.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.23.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.23.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.23.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.23.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.23.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.23.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.23.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.23.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.24.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.24.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.24.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.24.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.24.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.24.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.24.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.24.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.24.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.25.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.25.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.25.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.25.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.25.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.25.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.25.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.25.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.25.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.26.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.26.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.26.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.26.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.26.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.26.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.26.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.26.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.26.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.27.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.27.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.27.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.27.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.27.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.27.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.27.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.27.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.27.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.28.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.28.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.28.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.28.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.28.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.28.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.28.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.28.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.28.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.29.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.29.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.29.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.29.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.29.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.29.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.29.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.29.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.29.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.30.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.30.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.30.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.30.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.30.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.30.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.30.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.30.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.30.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.31.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.31.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.31.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\nINFO:hf-to-gguf:blk.31.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\nINFO:hf-to-gguf:blk.31.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.31.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\nINFO:hf-to-gguf:blk.31.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\nINFO:hf-to-gguf:blk.31.attn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:blk.31.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:output_norm.weight, torch.float16 --> F32, shape = {4096}\nINFO:hf-to-gguf:output.weight, torch.float16 --> F16, shape = {4096, 32768}\nINFO:hf-to-gguf:Set meta model\nINFO:hf-to-gguf:Set model parameters\nINFO:hf-to-gguf:gguf: context length = 32768\nINFO:hf-to-gguf:gguf: embedding length = 4096\nINFO:hf-to-gguf:gguf: feed forward length = 14336\nINFO:hf-to-gguf:gguf: head count = 32\nINFO:hf-to-gguf:gguf: key-value head count = 8\nINFO:hf-to-gguf:gguf: rope theta = 1000000.0\nINFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05\nINFO:hf-to-gguf:gguf: file type = 1\nINFO:hf-to-gguf:Set model tokenizer\nTraceback (most recent call last):\n File "/home/user/app/llama.cpp/convert_hf_to_gguf.py", line 1521, in set_vocab\n self._set_vocab_sentencepiece()\n File "/home/user/app/llama.cpp/convert_hf_to_gguf.py", line 744, in _set_vocab_sentencepiece\n tokens, scores, toktypes = self._create_vocab_sentencepiece()\n File "/home/user/app/llama.cpp/convert_hf_to_gguf.py", line 761, in _create_vocab_sentencepiece\n raise FileNotFoundError(f"File not found: {tokenizer_path}")\nFileNotFoundError: File not found: MistralInstruct-v0.3-FT-Merged/tokenizer.model\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/home/user/app/llama.cpp/convert_hf_to_gguf.py", line 1524, in set_vocab\n self._set_vocab_llama_hf()\n File "/home/user/app/llama.cpp/convert_hf_to_gguf.py", line 836, in _set_vocab_llama_hf\n vocab = gguf.LlamaHfVocab(self.dir_model)\n File "/home/user/app/llama.cpp/gguf-py/gguf/vocab.py", line 402, in init\n self.tokenizer = AutoTokenizer.from_pretrained(\n File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 907, in from_pretrained\n return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)\n File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2208, in from_pretrained\n return cls._from_pretrained(\n File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2442, in _from_pretrained\n tokenizer = cls(*init_inputs, **init_kwargs)\n File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 157, in init\n super().init(\n File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 132, in init\n slow_tokenizer = self.slow_tokenizer_class(*args, **kwargs)\n File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 171, in init\n self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))\n File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 201, in get_spm_processor\n with open(self.vocab_file, "rb") as f:\nTypeError: expected str, bytes or os.PathLike object, not NoneType\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/home/user/app/llama.cpp/convert_hf_to_gguf.py", line 4433, in \n main()\n File "/home/user/app/llama.cpp/convert_hf_to_gguf.py", line 4427, in main\n model_instance.write()\n File "/home/user/app/llama.cpp/convert_hf_to_gguf.py", line 434, in write\n self.prepare_metadata(vocab_only=False)\n File "/home/user/app/llama.cpp/convert_hf_to_gguf.py", line 427, in prepare_metadata\n self.set_vocab()\n File "/home/user/app/llama.cpp/convert_hf_to_gguf.py", line 1527, in set_vocab\n self._set_vocab_gpt2()\n File "/home/user/app/llama.cpp/convert_hf_to_gguf.py", line 680, in _set_vocab_gpt2\n tokens, toktypes, tokpre = self.get_vocab_base()\n File "/home/user/app/llama.cpp/convert_hf_to_gguf.py", line 511, in get_vocab_base\n tokenizer = AutoTokenizer.from_pretrained(self.dir_model)\n File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 907, in from_pretrained\n return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)\n File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2208, in from_pretrained\n return cls._from_pretrained(\n File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2442, in _from_pretrained\n tokenizer = cls(*init_inputs, **init_kwargs)\n File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 157, in init\n super().init(\n File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 132, in init\n slow_tokenizer = self.slow_tokenizer_class(*args, **kwargs)\n File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 171, in init\n self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))\n File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 201, in get_spm_processor\n with open(self.vocab_file, "rb") as f:\nTypeError: expected str, bytes or os.PathLike object, not NoneType\n'

This is the same method I've been using for mistral-7b-instruct-v0.2-bnb-4bit, although I will try again to see if the issue is specifically in v0.3.

CurtiusSimplus commented 1 month ago

Do you get this error at all?

Unsloth: Merging 4bit and LoRA weights to 16bit... Unsloth: Will use up to 23.73 out of 50.99 RAM for saving. 100%|██████████| 32/32 [00:19<00:00, 1.67it/s] Unsloth: Saving tokenizer... Done. Unsloth: Saving model... This might take 5 minutes for Llama-7b... Done. ==((====))== Unsloth: Conversion from QLoRA to GGUF information \ /| [0] Installing llama.cpp will take 3 minutes. O^O/ _/ \ [1] Converting HF to GGUF 16bits will take 3 minutes. \ / [2] Converting GGUF 16bits to ['q8_0'] will take 10 minutes each. "-____-" In total, you will have to wait at least 16 minutes.

Unsloth: [0] Installing llama.cpp. This will take 3 minutes... Unsloth: [1] Converting model at User/Model into q8_0 GGUF format. The output location will be /content/unsloth.Q8_0.gguf This will take 3 minutes...

TypeError Traceback (most recent call last) in <cell line: 4>() 2 # Save to 8bit Q8_0 3 #if True: model.save_pretrained_gguf("UserAI.gguf", tokenizer,) ----> 4 if True: model.push_to_hub_gguf("Me", tokenizer, quantization_method="q8_0", token="Mine") 5 6 # Save to 16bit GGUF

4 frames /usr/local/lib/python3.10/dist-packages/google/protobuf/descriptor.py in new(cls, name, package, options, serialized_options, serialized_pb, dependencies, public_dependencies, syntax, pool, create_key) 1022 name=None, 1023 full_name=None, -> 1024 index=None, 1025 methods=None, 1026 options=None,

TypeError: Couldn't build proto file into descriptor pool! Invalid proto descriptor for file "sentencepiece_model.proto": sentencepiece_model.proto: A file with this name is already in the pool.

danielhanchen commented 4 weeks ago

@Linguiniotta Kaggle sadly has limited disk space - the trick is to save LoRAs, then load it in Colab, then merge.

@CurtiusSimplus Oh yep I saw you opened a new issue

Linguiniotta commented 4 weeks ago

It merges successfully, but I cannot convert it to GGUF now, both v0.2 & v0.3 - with the builtin push_to_hub_gguf, though it used to work before. My last export to GGUF (q4_k_m, q5_k_m, q6_k) worked successfully with the builtin push_to_hub_gguf on October 12, 2024 ~12:16pm. My original environment in Kaggle is set to (2024-10-04). So probably a recent update messed with the push_to_hub_gguf function? Saving method is the same across different runs as shown in my original post.

With gguf-my-repo, I can successfully convert v0.2, but fails with v0.3, with the same error output as my main post.

Also is it possible to change the location of huggingface_tokenizers_cache/models--unsloth--mistral-7b-instruct-v0.3-bnb-4bit? I think this directory gets autogenerated during saving, by default in /kaggle/working. Is it possible to change it to /tmp as it has around 80~90GB disk space.

danielhanchen commented 4 weeks ago

@Linguiniotta Wait is /tmp actually allowed to be used - if yes - I will gladly edit all notebooks to use it!!! (Maybe try doing cat and ls on a file inside of /tmp)

Linguiniotta commented 4 weeks ago

Yes, it is available to use for around ~90GB, that's where I had been storing it for saving XD. My session metrics previously showed 81.6GiB/57.6GiB.

I have not done it in this newer update of unsloth yet, but in my previous notebooks here is the result of the saved GGUF. To save space/time, and since the merged model gets saved locally anyway, I just move it to the destination expected by the GGUF conversion.

%cd /tmp
peft_model.push_to_hub_merged(
    MERGED_MODEL, # The HF username/repo path for the merged model, gets saved to current dir
    tokenizer,
)
Output ```bash /tmp Unsloth: You are pushing to hub, but you passed your HF username = My_HF_Username. We shall truncate My_HF_Username/MistralInstruct-v0.2-FT-Merged to MistralInstruct-v0.2-FT-Merged Unsloth: You have 2 CPUs. Using `safe_serialization` is 10x slower. We shall switch to Pytorch saving, which will take 3 minutes and not 30 minutes. To force `safe_serialization`, set it to `None` instead. Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab. Unsloth: Will remove a cached repo with size 4.1G Unsloth: Merging 4bit and LoRA weights to 16bit... Unsloth: Will use up to 19.59 out of 31.36 RAM for saving. 3%|▎ | 1/32 [00:00<00:03, 8.61it/s]We will save to Disk and not RAM now. 100%|██████████| 32/32 [00:45<00:00, 1.41s/it] Unsloth: Saving tokenizer... Done. Unsloth: Saving model... This might take 5 minutes for Llama-7b... Unsloth: Saving MistralInstruct-v0.2-FT-Merged/pytorch_model-00001-of-00003.bin... Unsloth: Saving MistralInstruct-v0.2-FT-Merged/pytorch_model-00002-of-00003.bin... Unsloth: Saving MistralInstruct-v0.2-FT-Merged/pytorch_model-00003-of-00003.bin... ```
!mkdir -p /tmp/{GGUF_MODEL} && mv /tmp/{MERGED_MODEL}/* /tmp/{GGUF_MODEL}

Result after push_to_hub_gguf, older version

du -ha /tmp/{GGUF_MODEL} | sort -h ```bash 4.0K /tmp/My_HF_Username/MistralInstruct-v0.2-FT-GGUF/config.json 4.0K /tmp/My_HF_Username/MistralInstruct-v0.2-FT-GGUF/generation_config.json 4.0K /tmp/My_HF_Username/MistralInstruct-v0.2-FT-GGUF/special_tokens_map.json 4.0K /tmp/My_HF_Username/MistralInstruct-v0.2-FT-GGUF/tokenizer_config.json 8.0K /tmp/My_HF_Username/MistralInstruct-v0.2-FT-GGUF/README.md 24K /tmp/My_HF_Username/MistralInstruct-v0.2-FT-GGUF/pytorch_model.bin.index.json 3.4M /tmp/My_HF_Username/MistralInstruct-v0.2-FT-GGUF/tokenizer.json 4.1G /tmp/My_HF_Username/MistralInstruct-v0.2-FT-GGUF/unsloth.Q4_K_M.gguf 4.3G /tmp/My_HF_Username/MistralInstruct-v0.2-FT-GGUF/pytorch_model-00003-of-00003.bin 4.7G /tmp/My_HF_Username/MistralInstruct-v0.2-FT-GGUF/pytorch_model-00001-of-00003.bin 4.7G /tmp/My_HF_Username/MistralInstruct-v0.2-FT-GGUF/pytorch_model-00002-of-00003.bin 4.8G /tmp/My_HF_Username/MistralInstruct-v0.2-FT-GGUF/unsloth.Q5_K_M.gguf 5.6G /tmp/My_HF_Username/MistralInstruct-v0.2-FT-GGUF/unsloth.Q6_K.gguf 14G /tmp/My_HF_Username/MistralInstruct-v0.2-FT-GGUF/unsloth.F16.gguf 42G /tmp/My_HF_Username/MistralInstruct-v0.2-FT-GGUF ```
dendarrion commented 4 weeks ago

Following the error in the latter parts, I think this check could be the culprit. #1201

Although, upon checking, there has been no changes for this line in around October it is possible the issue is not here?

https://github.com/unslothai/unsloth/blob/49ae6194122b594a7054da0bfd6f387cf720f40f/unsloth/save.py#L1110-L1118 https://github.com/unslothai/unsloth/blob/49ae6194122b594a7054da0bfd6f387cf720f40f/unsloth/save.py#L1158-L1166