Open seghier opened 1 month ago
Could you please provide more details? Which command is extremely slow?
Which compiler?
Ubuntu 20.04 Clang-18
main: llama threadpool init, n_threads = 2
system_info: n_threads = 2 (n_threads_batch = 2) / 16 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
sampler seed: 4294967295 sampler params: repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000 top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.000 mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000 sampler chain: logits -> logit-bias -> penalties -> greedy generate: n_ctx = 2048, n_batch = 1, n_predict = 6, n_keep = 1
Daniel went back to the the the garden. Mary travelled to the kitchen. Sandra journeyed to the kitchen. Sandra went to the hallway. John went to the bedroom. Mary went back to the garden. Where is Mary? Answer: Mary is in the garden.
llama_perf_sampler_print: sampling time = 1.56 ms / 54 runs ( 0.03 ms per token, 34526.85 tokens per second) llama_perf_context_print: load time = 1756.28 ms llama_perf_context_print: prompt eval time = 36718.06 ms / 48 tokens ( 764.96 ms per token, 1.31 tokens per second) llama_perf_context_print: eval time = 3840.11 ms / 5 runs ( 768.02 ms per token, 1.30 tokens per second) llama_perf_context_print: total time = 40564.05 ms / 53 tokens
Extremely slow in CPU mode