Closed begoss closed 1 week ago
Please follow the guide. The pip install .
will download llvm+clang=17.0.6 and build TVM for you.
Please follow the guide. The
pip install .
will download llvm+clang=17.0.6 and build TVM for you.
Thanks for reply, I fixed PLATFORM_LLVM_MAP
and tried pip install .
command, it downloaded llvm+clang=17.0.6 and built TVM automatically, then I run python tools/run_pipeline.py -o /Users/huhao/Desktop/Project/LLM/Models/bitnet_b1_58-3B/bitnet_b1_58-3B
, it still stop on STEP.0:
Running STEP.0: Compile kernels
Running command in /Users/huhao/Desktop/Project/LLM/T-MAC/deploy:
python compile.py -o tuned -da -nt 4 -tb -gc -gs 128 -ags 64 -t -m hf-bitnet-3b -md /Users/huhao/Desktop/Project/LLM/Models/bitnet_b1_58-3B/bitnet_b1_58-3B
Please check logs/2024-08-21-14-46-35.log for what's wrong
The error log was the same:
Traceback (most recent call last):
File "compile.py", line 240, in <module>
main()
File "compile.py", line 230, in main
compile(**device_kwargs)
File "compile.py", line 126, in compile
qgemm_mod = qgemm_lut.compile(
File "/opt/anaconda3/envs/tvm-build-test/lib/python3.8/site-packages/t_mac/ops/base.py", line 255, in compile
self.tuning(*args, n_trial=n_trial, thread_affinity=thread_affinity, **eval_kwargs)
File "/opt/anaconda3/envs/tvm-build-test/lib/python3.8/site-packages/t_mac/ops/base.py", line 95, in tuning
task = autotvm.task.create(template_name, args=args, target=self.target)
File "/Users/huhao/Desktop/Project/LLM/T-MAC/3rdparty/tvm/python/tvm/autotvm/task/task.py", line 480, in create
sch, _ = ret.func(*args)
File "/Users/huhao/Desktop/Project/LLM/T-MAC/3rdparty/tvm/python/tvm/autotvm/task/task.py", line 240, in __call__
return self.fcustomized(*args, **kwargs)
File "/opt/anaconda3/envs/tvm-build-test/lib/python3.8/site-packages/t_mac/ops/base.py", line 72, in _func
sch = self._schedule(tensors)
File "/opt/anaconda3/envs/tvm-build-test/lib/python3.8/site-packages/t_mac/ops/qgemm.py", line 233, in _schedule
intrin, ll_code, header_code, body_code = tbl(
File "/opt/anaconda3/envs/tvm-build-test/lib/python3.8/site-packages/t_mac/intrins/tbl.py", line 166, in tbl
ll_code, header_code, body_code = _create_llvm("tbl.cc", body_code, cc, cc_opts)
File "/opt/anaconda3/envs/tvm-build-test/lib/python3.8/site-packages/t_mac/intrins/utils.py", line 23, in _create_llvm
ll_code = clang.create_llvm(
File "/Users/huhao/Desktop/Project/LLM/T-MAC/3rdparty/tvm/python/tvm/contrib/clang.py", line 107, in create_llvm
raise RuntimeError(msg)
RuntimeError: Compilation error:
/var/folders/bp/lv2qvml94f1fz9tzrtv0snkc0000gn/T/tmp0lajl8tn/input0.cc:354:42: error: always_inline function 'vcvtq_f16_s16' requires target feature 'fullfp16', but would be inlined into function 'tbl_g4_int8_float_update_impl' that is compiled without support for 'fullfp16'
float16x8_t vec_v_bot_low = vcvtq_f16_s16(adder_bot.get_low());
^
/var/folders/bp/lv2qvml94f1fz9tzrtv0snkc0000gn/T/tmp0lajl8tn/input0.cc:355:42: error: always_inline function 'vcvtq_f16_s16' requires target feature 'fullfp16', but would be inlined into function 'tbl_g4_int8_float_update_impl' that is compiled without support for 'fullfp16'
float16x8_t vec_v_bot_high = vcvtq_f16_s16(adder_bot.get_high());
^
/var/folders/bp/lv2qvml94f1fz9tzrtv0snkc0000gn/T/tmp0lajl8tn/input0.cc:356:42: error: always_inline function 'vcvtq_f16_s16' requires target feature 'fullfp16', but would be inlined into function 'tbl_g4_int8_float_update_impl' that is compiled without support for 'fullfp16'
float16x8_t vec_v_top_low = vcvtq_f16_s16(adder_top.get_low());
^
/var/folders/bp/lv2qvml94f1fz9tzrtv0snkc0000gn/T/tmp0lajl8tn/input0.cc:357:42: error: always_inline function 'vcvtq_f16_s16' requires target feature 'fullfp16', but would be inlined into function 'tbl_g4_int8_float_update_impl' that is compiled without support for 'fullfp16'
float16x8_t vec_v_top_high = vcvtq_f16_s16(adder_top.get_high());
^
4 errors generated.
I also tried modify config.cmake:
set(USE_LLVM "/Users/huhao/Desktop/Project/LLM/T-MAC/build/clang+llvm-17.0.6-arm64-apple-darwin22.0/bin/llvm-config")
then I rebuilt TMV, it used clang+llvm-17.0.6 successfully:
...
-- LLVM links against zlib
-- Found ZLIB: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.0.sdk/usr/lib/libz.tbd (found version "1.2.12")
-- Found zstd: /opt/homebrew/lib/libzstd.dylib
-- LLVM links against static zstd
-- LLVM linker flag: -lcurses
-- LLVM links against xml2
-- Found LLVM_INCLUDE_DIRS=/Users/huhao/Desktop/Project/LLM/T-MAC/build/clang+llvm-17.0.6-arm64-apple-darwin22.0/include
-- Found LLVM_DEFINITIONS=-D__STDC_CONSTANT_MACROS;-D__STDC_FORMAT_MACROS;-D__STDC_LIMIT_MACROS
-- Found LLVM_LIBS=/Users/huhao/Desktop/Project/LLM/T-MAC/build/clang+llvm-17.0.6-arm64-apple-darwin22.0/lib/libLLVMWindowsManifest.a;/Users/huhao/Desktop/Project/LLM/T-MAC/build/clang+llvm-17.0.6-arm64-apple-darwin22.0/lib/libLLVMXRay.a;...
...
[100%] Building CXX object CMakeFiles/tvm_runtime_objs.dir/src/runtime/contrib/random/random.cc.o
[100%] Building CXX object CMakeFiles/tvm_runtime_objs.dir/src/runtime/contrib/sort/sort.cc.o
[100%] Built target tvm_runtime_objs
[100%] Linking CXX shared library libtvm_runtime.dylib
[100%] Linking CXX shared library libtvm.dylib
ld: warning: -undefined error is deprecated
ld: warning: -undefined error is deprecated
[100%] Built target tvm_runtime
[100%] Built target tvm
But still got same error after I run run_pipeline.py.
How can I solve this? Thank you.
Thanks for reply, I fixed PLATFORM_LLVM_MAP
Can you give me more details for the reason?
I changed -mcpu to generic, then I got error like this:
Is this line still being changed by you? -mcpu=apple-m2
should work for llvm+clang=17.0.6, even on Apple M1.
I have pushed a fix to modify the arch map. However, can you confirm if -mcpu=apple-m2 work for you?
I have pushed a fix to modify the arch map. However, can you confirm if -mcpu=apple-m2 work for you?
I used the latest code, it worked on my M1Max, thank you!
Log start
main: build = 2854 (70c312d)
main: built with Apple clang version 15.0.0 (clang-1500.0.40.1) for arm64-apple-darwin23.4.0
main: seed = 1725344641
[14:24:01] /Users/huhao/Desktop/Project/LLM/T-MAC/3rdparty/llama.cpp/ggml-tmac.cpp:38: ggml_tmac_init
llama_model_loader: loaded meta data with 25 key-value pairs and 288 tensors from /Users/huhao/Desktop/Project/LLM/Models/bitnet_b1_58-3B/bitnet_b1_58-3B/ggml-model.in.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = bitnet
llama_model_loader: - kv 1: general.name str = bitnet_b1_58-3B
llama_model_loader: - kv 2: bitnet.block_count u32 = 26
llama_model_loader: - kv 3: bitnet.context_length u32 = 2048
llama_model_loader: - kv 4: bitnet.embedding_length u32 = 3200
llama_model_loader: - kv 5: bitnet.feed_forward_length u32 = 8640
llama_model_loader: - kv 6: bitnet.attention.head_count u32 = 32
llama_model_loader: - kv 7: bitnet.attention.head_count_kv u32 = 32
llama_model_loader: - kv 8: bitnet.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 9: bitnet.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: general.file_type u32 = 32
llama_model_loader: - kv 11: bitnet.vocab_size u32 = 32002
llama_model_loader: - kv 12: bitnet.rope.scaling.type str = linear
llama_model_loader: - kv 13: bitnet.rope.scaling.factor f32 = 1.000000
llama_model_loader: - kv 14: tokenizer.ggml.model str = llama
llama_model_loader: - kv 15: tokenizer.ggml.pre str = default
llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,32002] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,32002] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,32002] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 19: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 21: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 22: tokenizer.ggml.padding_token_id u32 = 32000
llama_model_loader: - kv 23: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 24: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - type f32: 105 tensors
llama_model_loader: - type f16: 1 tensors
llama_model_loader: - type i2: 182 tensors
llm_load_vocab: special tokens definition check successful ( 261/32002 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = bitnet
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32002
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 2048
llm_load_print_meta: n_embd = 3200
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 32
llm_load_print_meta: n_layer = 26
llm_load_print_meta: n_rot = 100
llm_load_print_meta: n_embd_head_k = 100
llm_load_print_meta: n_embd_head_v = 100
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 3200
llm_load_print_meta: n_embd_v_gqa = 3200
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 8640
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 2
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 2048
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 3B
llm_load_print_meta: model ftype = IN
llm_load_print_meta: model params = 3.32 B
llm_load_print_meta: model size = 965.21 MiB (2.44 BPW)
llm_load_print_meta: general.name = bitnet_b1_58-3B
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: PAD token = 32000 '</line>'
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.14 MiB
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/27 layers to GPU
llm_load_tensors: CPU buffer size = 965.22 MiB
.................................................................................
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: n_batch = 2048
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 650.00 MiB
llama_new_context_with_model: KV self size = 650.00 MiB, K (f16): 325.00 MiB, V (f16): 325.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.12 MiB
llama_new_context_with_model: CPU compute buffer size = 144.51 MiB
llama_new_context_with_model: graph nodes = 942
llama_new_context_with_model: graph splits = 1
system_info: n_threads = 4 / 10 | AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
sampling:
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 2048, n_batch = 2048, n_predict = 128, n_keep = 1
<s> Microsoft Corporation is an American multinational corporation and technology company headquartered in Redmond, Washington.Clients include the world’s largest and most influential companies across 190 countries.
Microsoft is the largest software company in the world, and one of the most valuable corporations in the world.
Microsoft has been at the forefront of innovation in technology, and their products are used by millions of people every day.
Microsoft has a strong commitment to technology, and their products are used by millions of people every day.
Microsoft Corporation is an American multinational corporation and technology company headquartered in Redmond, Washington.
Microsoft has been at the forefront of innovation in
llama_print_timings: load time = 183.89 ms
llama_print_timings: sample time = 2.99 ms / 128 runs ( 0.02 ms per token, 42780.75 tokens per second)
llama_print_timings: prompt eval time = 300.17 ms / 24 tokens ( 12.51 ms per token, 79.95 tokens per second)
llama_print_timings: eval time = 2572.80 ms / 127 runs ( 20.26 ms per token, 49.36 tokens per second)
llama_print_timings: total time = 2893.09 ms / 151 tokens
Log end
I built T-Mac and TVM with my Apple M1Max successfully, then I run
python tools/run_pipeline.py -o /Users/huhao/Desktop/Project/LLM/Models/bitnet_b1_58-3B/bitnet_b1_58-3B
, it got error:Here is my clang & llvm version:
I changed
-mcpu
togeneric
, then I got error like this:How can I solve this and run on my M1 successfully?