Closed m7mdhka closed 2 weeks ago
have you installed MS-Build Support for LLVM-Toolset (clang)? You can search "clang" in the picture you shared and try it again.
@sd983527 Yes, it's already installed. maybe because my OS is Windows Server or because I'm using AMD processor? because I tried it on another PC (Intel i7, win11) and it's work.
Hi. Pls help. I have Windows 11 as above image, Visual Studio 2022 with the optional installations shown above, yet the command "python setup_env.py -md models/Llama3-8B-1.58-100B-tokens -q i2_s" ends with the output below...
INFO:root:Compiling the code using CMake. INFO:root:Loading model from directory models/Llama3-8B-1.58-100B-tokens. INFO:root:Converting HF model to GGUF format... ERROR:root:Error occurred while running command: Command '['C:\Users\edindi\AppData\Local\Programs\Python\Python312\python.exe', 'utils/convert-hf-to-gguf-bitnet.py', 'models/Llama3-8B-1.58-100B-tokens', '--outtype', 'f32']' returned non-zero exit status 3221225477., check details in logs\convert_to_f32_gguf.log
The "logs\convert_to_f32_gguf.log" has nothing very informative about the error...
INFO:hf-to-gguf:Loading model: Llama3-8B-1.58-100B-tokens INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only INFO:hf-to-gguf:Set model parameters INFO:hf-to-gguf:gguf: context length = 8192 INFO:hf-to-gguf:gguf: embedding length = 4096 INFO:hf-to-gguf:gguf: feed forward length = 14336 INFO:hf-to-gguf:gguf: head count = 32 INFO:hf-to-gguf:gguf: key-value head count = 8 INFO:hf-to-gguf:gguf: rope theta = 500000.0 INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05 INFO:hf-to-gguf:gguf: file type = 0 INFO:hf-to-gguf:Set model tokenizer INFO:gguf.vocab:Adding 280147 merge(s). INFO:gguf.vocab:Setting special token type bos to 128000 INFO:gguf.vocab:Setting special token type eos to 128009 INFO:gguf.vocab:Setting chat_template to {% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>
'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>
' }}{% endif %} INFO:hf-to-gguf:Exporting model to 'models\Llama3-8B-1.58-100B-tokens\ggml-model-f32.gguf' INFO:hf-to-gguf:gguf: loading model part 'model.safetensors' INFO:hf-to-gguf:gguf: loading model part 'model.safetensors' INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F32, shape = {4096, 128256} INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F32, shape = {4096, 128256} INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.0.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.0.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.0.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.0.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.1.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.1.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.1.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.1.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.10.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.10.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.10.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.10.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.10.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.10.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.10.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.10.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.11.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.11.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.11.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.11.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.11.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.11.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.11.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.11.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.12.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.12.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.12.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.12.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.12.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.12.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.12.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.12.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.13.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.13.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.13.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.13.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.13.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.13.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.13.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.13.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.14.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.14.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.14.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336}
The issue may be caused by out of RAM issue since in the converting stage, it requires much more memory than actually inference stage. I'd suggest you to test with the 700mb model on this machine. Or you may just convert the model file in another machine with larger memory.
The issue may be caused by out of RAM issue since in the converting stage, it requires much more memory than actually inference stage. I'd suggest you to test with the 700mb model on this machine. Or you may just convert the model file in another machine with larger memory.
Let me try a few things in the light of this. I'll revert back. Thanks.
I got this problem when I use it: