microsoft T-MAC issues - Githubissues

microsoft / T-MAC

Low-bit LLM inference on CPU with lookup table

MIT License

588 stars 44 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

The perplexity tool returns abnormal values

#70 ppp-max opened 1 week ago
4
Update scripts

#69 QingtaoLi1 closed 1 week ago
0
[Fix] Typo in README.md

#68 lhpqaq closed 1 week ago
1
Request for Documentation on LUT Quantization Theory and Generation Methods for LUT_Biases and LUT_Scales

#67 zhouexellent opened 3 weeks ago
1
History tune.log may bypass kernel generation configurations.

#66 QingtaoLi1 opened 1 month ago
0
The perplexity tool returns unexpected ppl results

#65 fefang opened 1 month ago
1
lastest version get worse perforamce

#64 qw1319 closed 1 month ago
0
lastest tmac version run error

#63 qw1319 closed 1 month ago
2
Cannot get llama-2-7b-4bit quant model run normally.

#62 Zijie-Tian closed 1 week ago
1
Why is there no difference in the E2E performance of T-MAC and llama.cpp on arm machine?

#61 ppp-max opened 1 month ago
3
Why is there no difference in the E2E performance of T-MAC and llama.cpp on arm machine?

#60 ppp-max closed 1 month ago
0
about gptq quantization tricks

#59 sinoaidi opened 1 month ago
0
when i change kfator lager than 16, then ca not get a right result

#57 qw1319 closed 1 month ago
3
The log is stuck on the "running build_py" line for 2 hours， is this normal?

#56 sunj0104 opened 1 month ago
9
ValueError: operands could not be broadcast together with shapes during GEMV kernel profiling with TMAC

#55 KoalaYuFeng closed 1 month ago
3
Merge latest llama.cpp with OpenMP for better multi-threading performance and more models such as qwen2.

#54 kaleid-liner closed 1 month ago
0
Andorid tune kernel question

#53 qw1319 opened 1 month ago
8
about precision loss

#52 sinoaidi closed 1 month ago
3
TVM problem for Android

#51 robo-z closed 1 month ago
5
How to profile the perf of 2bit T-MAC GEMM in llama.cpp

#50 zhewang1-intc closed 1 month ago
4
How can I test the performance of a single matrix multiplication in tmac? thanks

#49 lijianxing123 closed 1 month ago
1
Does the speed performance have anything to do with the shape of the multiplication matrix? Why is the matrix multiplication of [34,2048]*[2048,5632] much faster than the matrix multiplication of [34,5632]*[5632,2048]? They have the same FLOPs.

#48 lijianxing123 opened 2 months ago
1
Compile kernels failed,RuntimeError: Compilation error: /tmp/tmpxinqgwfx/input0.cc:20:10: fatal error: 'type_traits' file not found

#47 xiangzhangpang closed 2 months ago
3
Merge llama.cpp b69a480

#46 kaleid-liner closed 1 month ago
1
T-MAC 1.0.0 Release Plan

#45 kaleid-liner opened 2 months ago
0
How to test kernel performance using your code?

#44 orange-juice1 closed 2 months ago
20
compile llama.cpp failed

#43 qw1319 closed 2 months ago
2
Compile TVM failed

#42 velonica0 closed 2 months ago
6
Whether the T-MAC supports the RISC-V CPU？

#41 velonica0 opened 2 months ago
1
Slow performance compared to llama.cpp origin

#40 idreamerhx opened 2 months ago
7
performance on mobile phone such as MTK D9000/D8300 or Qualcomm 8Gen3

#39 yuimo opened 2 months ago
2
[Question]Whether T-MAC supports mixed-precision LLM?

#38 AndreaChiChengdu opened 2 months ago
2
where is t-mac-envs.sh

#37 idreamerhx closed 2 months ago
4
[Fix] Try to fix ubuntu installation (#26)

#36 kaleid-liner closed 2 months ago
1
Question about "Bit-serial linear transformation" in paper

#34 zjnyly closed 2 months ago
1
How to get more detail information

#33 yoghur closed 2 months ago
4
8gen3 T-MAC cpu performance issue

#32 AndreaChiChengdu opened 2 months ago
9
llama-2-7b 4bit convert fail issue(2bit is ok)

#31 AndreaChiChengdu closed 2 months ago
3
How to Fully Utilize the Optimized Performance of T-MAC ?

#30 ma-hang opened 2 months ago
2
Add Android cross-compilation support (feat #12, #18)

#29 kaleid-liner closed 2 months ago
0
OpenAI compatible chat completions endpoint

#28 maxim-saplin opened 3 months ago
9
【Qwen】Could you please update 3rd/llama.cpp to support Qwen1.5 or Qwen2 ?

#27 tiger-of-shawn closed 1 month ago
14
ubuntu installation problem on Google colab

#26 amirhosein-darmani opened 3 months ago
25
Cannot run compile.py on my Apple M1Max.

#25 begoss closed 2 months ago
5
Any plans to merge the latest code of llama.cpp?

#24 peytoncai opened 3 months ago
21
How to build ollama with t-mac support？

#23 goukisun opened 3 months ago
2
question about run_pipeline.py

#22 sunj0104 closed 3 months ago
5
how to add new models and their kernels?

#21 jason-zou closed 2 months ago
1
Thanks for the excellent work, what happens when let the CPU and GPU (or other) do the inference operations at the same time.

#20 aoom closed 1 month ago
1
The program hangs when I run the command "pip install . -v"

#19 spxcds closed 3 months ago
1