microsoft / BitNet

Official inference framework for 1-bit LLMs
MIT License
11.39k stars 768 forks source link

returned non-zero exit status 3221225477 #89

Closed m7mdhka closed 2 weeks ago

m7mdhka commented 3 weeks ago

I got this problem when I use it:

(bitnet-cpp) C:\Users\m.rahamneh\Desktop\GP\BitNet>python setup_env.py --hf-repo HF1BitLLM/Llama3-8B-1.58-100B-tokens -q i2_s
INFO:root:Compiling the code using CMake.
INFO:root:Downloading model HF1BitLLM/Llama3-8B-1.58-100B-tokens from HuggingFace to models\Llama3-8B-1.58-100B-tokens...
INFO:root:Converting HF model to GGUF format...
INFO:root:GGUF model saved at models\Llama3-8B-1.58-100B-tokens\ggml-model-i2_s.gguf

(bitnet-cpp) C:\Users\m.rahamneh\Desktop\GP\BitNet>python run_inference.py -m models/Llama3-8B-1.58-100B-tokens/ggml-model-i2_s.gguf -p "Daniel went back to the the the garden. Mary travelled to the kitchen. Sandra journeyed to the kitchen. Sandra went to the hallway. John went to the bedroom. Mary went back to the garden. Where is Mary?\nAnswer:" -n 6 -temp 0
warning: not compiled with GPU offload support, --gpu-layers option will be ignored
warning: see main README.md for information on enabling GPU BLAS support
Error occurred while running command: Command '['build\\bin\\Release\\llama-cli.exe', '-m', 'models/Llama3-8B-1.58-100B-tokens/ggml-model-i2_s.gguf', '-n', '6', '-t', '2', '-p', 'Daniel went back to the the the garden. Mary travelled to the kitchen. Sandra journeyed to the kitchen. Sandra went to the hallway. John went to the bedroom. Mary went back to the garden. Where is Mary?\\nAnswer:', '-ngl', '0', '-c', '2048', '--temp', '0.0', '-b', '1']' returned non-zero exit status 3221225477.

Image

sd983527 commented 3 weeks ago

have you installed MS-Build Support for LLVM-Toolset (clang)? You can search "clang" in the picture you shared and try it again.

m7mdhka commented 3 weeks ago

@sd983527 Yes, it's already installed. maybe because my OS is Windows Server or because I'm using AMD processor? because I tried it on another PC (Intel i7, win11) and it's work.

edindirangu commented 3 weeks ago

Image Image

Hi. Pls help. I have Windows 11 as above image, Visual Studio 2022 with the optional installations shown above, yet the command "python setup_env.py -md models/Llama3-8B-1.58-100B-tokens -q i2_s" ends with the output below...

INFO:root:Compiling the code using CMake. INFO:root:Loading model from directory models/Llama3-8B-1.58-100B-tokens. INFO:root:Converting HF model to GGUF format... ERROR:root:Error occurred while running command: Command '['C:\Users\edindi\AppData\Local\Programs\Python\Python312\python.exe', 'utils/convert-hf-to-gguf-bitnet.py', 'models/Llama3-8B-1.58-100B-tokens', '--outtype', 'f32']' returned non-zero exit status 3221225477., check details in logs\convert_to_f32_gguf.log

The "logs\convert_to_f32_gguf.log" has nothing very informative about the error...

INFO:hf-to-gguf:Loading model: Llama3-8B-1.58-100B-tokens INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only INFO:hf-to-gguf:Set model parameters INFO:hf-to-gguf:gguf: context length = 8192 INFO:hf-to-gguf:gguf: embedding length = 4096 INFO:hf-to-gguf:gguf: feed forward length = 14336 INFO:hf-to-gguf:gguf: head count = 32 INFO:hf-to-gguf:gguf: key-value head count = 8 INFO:hf-to-gguf:gguf: rope theta = 500000.0 INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05 INFO:hf-to-gguf:gguf: file type = 0 INFO:hf-to-gguf:Set model tokenizer INFO:gguf.vocab:Adding 280147 merge(s). INFO:gguf.vocab:Setting special token type bos to 128000 INFO:gguf.vocab:Setting special token type eos to 128009 INFO:gguf.vocab:Setting chat_template to {% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>

'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>

' }}{% endif %} INFO:hf-to-gguf:Exporting model to 'models\Llama3-8B-1.58-100B-tokens\ggml-model-f32.gguf' INFO:hf-to-gguf:gguf: loading model part 'model.safetensors' INFO:hf-to-gguf:gguf: loading model part 'model.safetensors' INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F32, shape = {4096, 128256} INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F32, shape = {4096, 128256} INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.0.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.0.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.0.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.0.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.1.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.1.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.1.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.1.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.10.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.10.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.10.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.10.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.10.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.10.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.10.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.10.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.11.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.11.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.11.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.11.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.11.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.11.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.11.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.11.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.12.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.12.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.12.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.12.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.12.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.12.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.12.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.12.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.13.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.13.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.13.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.13.ffn_up.weight, torch.uint8 --> F32, shape = {4096, 14336} INFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.13.attn_k.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.13.attn_output.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.13.attn_q.weight, torch.uint8 --> F32, shape = {4096, 4096} INFO:hf-to-gguf:blk.13.attn_v.weight, torch.uint8 --> F32, shape = {4096, 1024} INFO:hf-to-gguf:blk.14.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096} INFO:hf-to-gguf:blk.14.ffn_down.weight, torch.uint8 --> F32, shape = {14336, 4096} INFO:hf-to-gguf:blk.14.ffn_gate.weight, torch.uint8 --> F32, shape = {4096, 14336}

sd983527 commented 3 weeks ago

The issue may be caused by out of RAM issue since in the converting stage, it requires much more memory than actually inference stage. I'd suggest you to test with the 700mb model on this machine. Or you may just convert the model file in another machine with larger memory.

edindirangu commented 3 weeks ago

The issue may be caused by out of RAM issue since in the converting stage, it requires much more memory than actually inference stage. I'd suggest you to test with the 700mb model on this machine. Or you may just convert the model file in another machine with larger memory.

Let me try a few things in the light of this. I'll revert back. Thanks.