Running on Colab - convert-hf-to-gguf-bitnet.py stops with "^C"

microsoft / BitNet

Official inference framework for 1-bit LLMs

MIT License

9.8k stars 651 forks source link

I am using google colab. Downloaded "Llama3-8B-1.58-100B-tokens" model. but when I run : !python utils/convert-hf-to-gguf-bitnet.py models/Llama3-8B-1.58-100B-tokens --outtype f32 initially it starts converting model, but suddenly it stops.

I get this output : `INFO:hf-to-gguf:Loading model: Llama3-8B-1.58-100B-tokens INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only INFO:hf-to-gguf:Set model parameters INFO:hf-to-gguf:gguf: context length = 8192 INFO:hf-to-gguf:gguf: embedding length = 4096 INFO:hf-to-gguf:gguf: feed forward length = 14336 INFO:hf-to-gguf:gguf: head count = 32 INFO:hf-to-gguf:gguf: key-value head count = 8 INFO:hf-to-gguf:gguf: rope theta = 500000.0 INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05 INFO:hf-to-gguf:gguf: file type = 0 INFO:hf-to-gguf:Set model tokenizer INFO:gguf.vocab:Adding 280147 merge(s). INFO:gguf.vocab:Setting special token type bos to 128000 INFO:gguf.vocab:Setting special token type eos to 128009 INFO:gguf.vocab:Setting chat_template to {% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>

'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>

' }}{% endif %} INFO:hf-to-gguf:Exporting model INFO:hf-to-gguf:gguf: loading INFO:hf-to-gguf:gguf: loading INFO:hf-to-gguf:output.weight, INFO:hf-to-gguf:token_embd.weight, INFO:hf-to-gguf:blk.0.attn_norm.weight, INFO:hf-to-gguf:blk.0.ffn_down.weight, INFO:hf-to-gguf:blk.0.ffn_gate.weight, INFO:hf-to-gguf:blk.0.ffn_up.weight, INFO:hf-to-gguf:blk.0.ffn_norm.weight, INFO:hf-to-gguf:blk.0.attn_k.weight, INFO:hf-to-gguf:blk.0.attn_output.weight, INFO:hf-to-gguf:blk.0.attn_q.weight, INFO:hf-to-gguf:blk.0.attn_v.weight, INFO:hf-to-gguf:blk.1.attn_norm.weight, INFO:hf-to-gguf:blk.1.ffn_down.weight, INFO:hf-to-gguf:blk.1.ffn_gate.weight, INFO:hf-to-gguf:blk.1.ffn_up.weight, INFO:hf-to-gguf:blk.1.ffn_norm.weight, INFO:hf-to-gguf:blk.1.attn_k.weight, INFO:hf-to-gguf:blk.1.attn_output.weight, INFO:hf-to-gguf:blk.1.attn_q.weight, INFO:hf-to-gguf:blk.1.attn_v.weight, INFO:hf-to-gguf:blk.10.attn_norm.weight, INFO:hf-to-gguf:blk.10.ffn_down.weight, INFO:hf-to-gguf:blk.10.ffn_gate.weight, INFO:hf-to-gguf:blk.10.ffn_up.weight, INFO:hf-to-gguf:blk.10.ffn_norm.weight, INFO:hf-to-gguf:blk.10.attn_k.weight, INFO:hf-to-gguf:blk.10.attn_output.weight, INFO:hf-to-gguf:blk.10.attn_q.weight, INFO:hf-to-gguf:blk.10.attn_v.weight, INFO:hf-to-gguf:blk.11.attn_norm.weight, INFO:hf-to-gguf:blk.11.ffn_down.weight, INFO:hf-to-gguf:blk.11.ffn_gate.weight, INFO:hf-to-gguf:blk.11.ffn_up.weight, INFO:hf-to-gguf:blk.11.ffn_norm.weight, INFO:hf-to-gguf:blk.11.attn_k.weight, INFO:hf-to-gguf:blk.11.attn_output.weight, INFO:hf-to-gguf:blk.11.attn_q.weight, INFO:hf-to-gguf:blk.11.attn_v.weight, INFO:hf-to-gguf:blk.12.attn_norm.weight, INFO:hf-to-gguf:blk.12.ffn_down.weight, INFO:hf-to-gguf:blk.12.ffn_gate.weight, INFO:hf-to-gguf:blk.12.ffn_up.weight, INFO:hf-to-gguf:blk.12.ffn_norm.weight, INFO:hf-to-gguf:blk.12.attn_k.weight, INFO:hf-to-gguf:blk.12.attn_output.weight, INFO:hf-to-gguf:blk.12.attn_q.weight, INFO:hf-to-gguf:blk.12.attn_v.weight, INFO:hf-to-gguf:blk.13.attn_norm.weight, INFO:hf-to-gguf:blk.13.ffn_down.weight, INFO:hf-to-gguf:blk.13.ffn_gate.weight, INFO:hf-to-gguf:blk.13.ffn_up.weight, INFO:hf-to-gguf:blk.13.ffn_norm.weight, INFO:hf-to-gguf:blk.13.attn_k.weight, INFO:hf-to-gguf:blk.13.attn_output.weight, INFO:hf-to-gguf:blk.13.attn_q.weight, INFO:hf-to-gguf:blk.13.attn_v.weight, INFO:hf-to-gguf:blk.14.attn_norm.weight, INFO:hf-to-gguf:blk.14.ffn_down.weight, INFO:hf-to-gguf:blk.14.ffn_gate.weight, INFO:hf-to-gguf:blk.14.ffn_up.weight, INFO:hf-to-gguf:blk.14.ffn_norm.weight, INFO:hf-to-gguf:blk.14.attn_k.weight, INFO:hf-to-gguf:blk.14.attn_output.weight, INFO:hf-to-gguf:blk.14.attn_q.weight, INFO:hf-to-gguf:blk.14.attn_v.weight, INFO:hf-to-gguf:blk.15.attn_norm.weight, INFO:hf-to-gguf:blk.15.ffn_down.weight, ^C` to 'models/Llama3-8B-1.58-100B-tokens/ggml-model-f32.gguf' model part 'model.safetensors' model part 'model.safetensors' torch.bfloat16 --> F32, shape = {4096, 128256} torch.bfloat16 --> F32, shape = {4096, 128256} torch.bfloat16 --> F32, shape = {4096} torch.uint8 --> F32, shape = {14336, 4096} torch.uint8 --> F32, shape = {4096, 14336} torch.uint8 --> F32, shape = {4096, 14336} torch.bfloat16 --> F32, shape = {4096} torch.uint8 --> F32, shape = {4096, 1024} torch.uint8 --> F32, shape = {4096, 4096} torch.uint8 --> F32, shape = {4096, 4096} torch.uint8 --> F32, shape = {4096, 1024} torch.bfloat16 --> F32, shape = {4096} torch.uint8 --> F32, shape = {14336, 4096} torch.uint8 --> F32, shape = {4096, 14336} torch.uint8 --> F32, shape = {4096, 14336} torch.bfloat16 --> F32, shape = {4096} torch.uint8 --> F32, shape = {4096, 1024} torch.uint8 --> F32, shape = {4096, 4096} torch.uint8 --> F32, shape = {4096, 4096} torch.uint8 --> F32, shape = {4096, 1024} torch.bfloat16 --> F32, shape = {4096} torch.uint8 --> F32, shape = {14336, 4096} torch.uint8 --> F32, shape = {4096, 14336} torch.uint8 --> F32, shape = {4096, 14336} torch.bfloat16 --> F32, shape = {4096} torch.uint8 --> F32, shape = {4096, 1024} torch.uint8 --> F32, shape = {4096, 4096} torch.uint8 --> F32, shape = {4096, 4096} torch.uint8 --> F32, shape = {4096, 1024} torch.bfloat16 --> F32, shape = {4096} torch.uint8 --> F32, shape = {14336, 4096} torch.uint8 --> F32, shape = {4096, 14336} torch.uint8 --> F32, shape = {4096, 14336} torch.bfloat16 --> F32, shape = {4096} torch.uint8 --> F32, shape = {4096, 1024} torch.uint8 --> F32, shape = {4096, 4096} torch.uint8 --> F32, shape = {4096, 4096} torch.uint8 --> F32, shape = {4096, 1024} torch.bfloat16 --> F32, shape = {4096} torch.uint8 --> F32, shape = {14336, 4096} torch.uint8 --> F32, shape = {4096, 14336} torch.uint8 --> F32, shape = {4096, 14336} torch.bfloat16 --> F32, shape = {4096} torch.uint8 --> F32, shape = {4096, 1024} torch.uint8 --> F32, shape = {4096, 4096} torch.uint8 --> F32, shape = {4096, 4096} torch.uint8 --> F32, shape = {4096, 1024} torch.bfloat16 --> F32, shape = {4096} torch.uint8 --> F32, shape = {14336, 4096} torch.uint8 --> F32, shape = {4096, 14336} torch.uint8 --> F32, shape = {4096, 14336} torch.bfloat16 --> F32, shape = {4096} torch.uint8 --> F32, shape = {4096, 1024} torch.uint8 --> F32, shape = {4096, 4096} torch.uint8 --> F32, shape = {4096, 4096} torch.uint8 --> F32, shape = {4096, 1024} torch.bfloat16 --> F32, shape = {4096} torch.uint8 --> F32, shape = {14336, 4096} torch.uint8 --> F32, shape = {4096, 14336} torch.uint8 --> F32, shape = {4096, 14336} torch.bfloat16 --> F32, shape = {4096} torch.uint8 --> F32, shape = {4096, 1024} torch.uint8 --> F32, shape = {4096, 4096} torch.uint8 --> F32, shape = {4096, 4096} torch.uint8 --> F32, shape = {4096, 1024} torch.bfloat16 --> F32, shape = {4096} torch.uint8 --> F32, shape = {14336, 4096}

And logs : `INFO:hf-to-gguf:Loading model: Llama3-8B-1.58-100B-tokens INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only INFO:hf-to-gguf:Set model parameters INFO:hf-to-gguf:gguf: context length = 8192 INFO:hf-to-gguf:gguf: embedding length = 4096 INFO:hf-to-gguf:gguf: feed forward length = 14336 INFO:hf-to-gguf:gguf: head count = 32 INFO:hf-to-gguf:gguf: key-value head count = 8 INFO:hf-to-gguf:gguf: rope theta = 500000.0 INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05 INFO:hf-to-gguf:gguf: file type = 0 INFO:hf-to-gguf:Set model tokenizer INFO:gguf.vocab:Adding 280147 merge(s). INFO:gguf.vocab:Setting special token type bos to 128000 INFO:gguf.vocab:Setting special token type eos to 128009 INFO:gguf.vocab:Setting chat_template to {% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>

' }}{% endif %} INFO:hf-to-gguf:Exporting model to 'models/Llama3-8B-1.58-100B-tokens/ggml-model-f32.gguf' INFO:hf-to-gguf:gguf: loading model part 'model.safetensors' INFO:hf-to-gguf:gguf: loading model part 'model.safetensors' INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F32, shape = {4096, 128256} INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F32, shape = {4096, 128256} INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.0.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.0.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.0.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.0.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.1.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.1.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.1.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.1.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.10.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.10.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.10.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.10.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.10.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.10.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.10.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.10.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.11.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.11.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.11.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.11.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.11.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.11.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.11.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.11.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.12.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.12.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.12.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.12.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.12.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.12.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.12.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.12.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.13.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.13.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.13.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.13.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.13.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.13.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.13.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.13.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.14.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.14.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.14.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.14.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.14.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.14.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.14.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.14.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.14.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.15.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.15.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.15.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} `

Please help me with this, thanks

I am using google colab. Downloaded "Llama3-8B-1.58-100B-tokens" model. but when I run : !python utils/convert-hf-to-gguf-bitnet.py models/Llama3-8B-1.58-100B-tokens --outtype f32 initially it starts converting model, but suddenly it stops.

I get this output : `INFO:hf-to-gguf:Loading model: Llama3-8B-1.58-100B-tokens INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only INFO:hf-to-gguf:Set model parameters INFO:hf-to-gguf:gguf: context length = 8192 INFO:hf-to-gguf:gguf: embedding length = 4096 INFO:hf-to-gguf:gguf: feed forward length = 14336 INFO:hf-to-gguf:gguf: head count = 32 INFO:hf-to-gguf:gguf: key-value head count = 8 INFO:hf-to-gguf:gguf: rope theta = 500000.0 INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05 INFO:hf-to-gguf:gguf: file type = 0 INFO:hf-to-gguf:Set model tokenizer INFO:gguf.vocab:Adding 280147 merge(s). INFO:gguf.vocab:Setting special token type bos to 128000 INFO:gguf.vocab:Setting special token type eos to 128009 INFO:gguf.vocab:Setting chat_template to {% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>

'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>

' }}{% endif %} INFO:hf-to-gguf:Exporting model to 'models/Llama3-8B-1.58-100B-tokens/ggml-model-f32.gguf' INFO:hf-to-gguf:gguf: loading model part 'model.safetensors' INFO:hf-to-gguf:gguf: loading model part 'model.safetensors' INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F32, shape = {4096, 128256} INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F32, shape = {4096, 128256} INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.0.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.0.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.0.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.0.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.1.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.1.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.1.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.1.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.10.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.10.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.10.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.10.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.10.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.10.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.10.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.10.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.11.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.11.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.11.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.11.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.11.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.11.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.11.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.11.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.12.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.12.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.12.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.12.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.12.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.12.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.12.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.12.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.13.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.13.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.13.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.13.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.13.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.13.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.13.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.13.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.14.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.14.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.14.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.14.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.14.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.14.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.14.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.14.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.14.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.15.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.15.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} ^C`

And logs : `INFO:hf-to-gguf:Loading model: Llama3-8B-1.58-100B-tokens INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only INFO:hf-to-gguf:Set model parameters INFO:hf-to-gguf:gguf: context length = 8192 INFO:hf-to-gguf:gguf: embedding length = 4096 INFO:hf-to-gguf:gguf: feed forward length = 14336 INFO:hf-to-gguf:gguf: head count = 32 INFO:hf-to-gguf:gguf: key-value head count = 8 INFO:hf-to-gguf:gguf: rope theta = 500000.0 INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05 INFO:hf-to-gguf:gguf: file type = 0 INFO:hf-to-gguf:Set model tokenizer INFO:gguf.vocab:Adding 280147 merge(s). INFO:gguf.vocab:Setting special token type bos to 128000 INFO:gguf.vocab:Setting special token type eos to 128009 INFO:gguf.vocab:Setting chat_template to {% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>

'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>

' }}{% endif %} INFO:hf-to-gguf:Exporting model to 'models/Llama3-8B-1.58-100B-tokens/ggml-model-f32.gguf' INFO:hf-to-gguf:gguf: loading model part 'model.safetensors' INFO:hf-to-gguf:gguf: loading model part 'model.safetensors' INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F32, shape = {4096, 128256} INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F32, shape = {4096, 128256} INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.0.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.0.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.0.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.0.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.1.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.1.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.1.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.1.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.10.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.10.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.10.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.10.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.10.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.10.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.10.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.10.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.11.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.11.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.11.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.11.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.11.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.11.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.11.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.11.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.12.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.12.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.12.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.12.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.12.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.12.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.12.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.12.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.13.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.13.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.13.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.13.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.13.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.13.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.13.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.13.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.14.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.14.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.14.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.14.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.14.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.14.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.14.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.14.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.14.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.15.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.15.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.15.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} `

Please help me with this, thanks

I have tried using the Llama3-8B-1.58-100B-tokens model, and it turned out that the same error appeared. The solution that I tried and succeeded with was to use the bitnet_b1_58-3B model. You can check it out at : https://www.kaggle.com/code/agungpambudi/bitnet-faster-inference-framework-for-1-bit-llm/ .

You can also download the converted model to GGUF.

microsoft / BitNet

Running on Colab - convert-hf-to-gguf-bitnet.py stops with "^C" #77