issues
search
microsoft
/
T-MAC
Low-bit LLM inference on CPU with lookup table
MIT License
588
stars
44
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
The perplexity tool returns abnormal values
#70
ppp-max
opened
1 week ago
4
Update scripts
#69
QingtaoLi1
closed
1 week ago
0
[Fix] Typo in README.md
#68
lhpqaq
closed
1 week ago
1
Request for Documentation on LUT Quantization Theory and Generation Methods for LUT_Biases and LUT_Scales
#67
zhouexellent
opened
3 weeks ago
1
History tune.log may bypass kernel generation configurations.
#66
QingtaoLi1
opened
1 month ago
0
The perplexity tool returns unexpected ppl results
#65
fefang
opened
1 month ago
1
lastest version get worse perforamce
#64
qw1319
closed
1 month ago
0
lastest tmac version run error
#63
qw1319
closed
1 month ago
2
Cannot get llama-2-7b-4bit quant model run normally.
#62
Zijie-Tian
closed
1 week ago
1
Why is there no difference in the E2E performance of T-MAC and llama.cpp on arm machine?
#61
ppp-max
opened
1 month ago
3
Why is there no difference in the E2E performance of T-MAC and llama.cpp on arm machine?
#60
ppp-max
closed
1 month ago
0
about gptq quantization tricks
#59
sinoaidi
opened
1 month ago
0
when i change kfator lager than 16, then ca not get a right result
#57
qw1319
closed
1 month ago
3
The log is stuck on the "running build_py" line for 2 hours, is this normal?
#56
sunj0104
opened
1 month ago
9
ValueError: operands could not be broadcast together with shapes during GEMV kernel profiling with TMAC
#55
KoalaYuFeng
closed
1 month ago
3
Merge latest llama.cpp with OpenMP for better multi-threading performance and more models such as qwen2.
#54
kaleid-liner
closed
1 month ago
0
Andorid tune kernel question
#53
qw1319
opened
1 month ago
8
about precision loss
#52
sinoaidi
closed
1 month ago
3
TVM problem for Android
#51
robo-z
closed
1 month ago
5
How to profile the perf of 2bit T-MAC GEMM in llama.cpp
#50
zhewang1-intc
closed
1 month ago
4
How can I test the performance of a single matrix multiplication in tmac? thanks
#49
lijianxing123
closed
1 month ago
1
Does the speed performance have anything to do with the shape of the multiplication matrix? Why is the matrix multiplication of [34,2048]*[2048,5632] much faster than the matrix multiplication of [34,5632]*[5632,2048]? They have the same FLOPs.
#48
lijianxing123
opened
2 months ago
1
Compile kernels failed,RuntimeError: Compilation error: /tmp/tmpxinqgwfx/input0.cc:20:10: fatal error: 'type_traits' file not found
#47
xiangzhangpang
closed
2 months ago
3
Merge llama.cpp b69a480
#46
kaleid-liner
closed
1 month ago
1
T-MAC 1.0.0 Release Plan
#45
kaleid-liner
opened
2 months ago
0
How to test kernel performance using your code?
#44
orange-juice1
closed
2 months ago
20
compile llama.cpp failed
#43
qw1319
closed
2 months ago
2
Compile TVM failed
#42
velonica0
closed
2 months ago
6
Whether the T-MAC supports the RISC-V CPU?
#41
velonica0
opened
2 months ago
1
Slow performance compared to llama.cpp origin
#40
idreamerhx
opened
2 months ago
7
performance on mobile phone such as MTK D9000/D8300 or Qualcomm 8Gen3
#39
yuimo
opened
2 months ago
2
[Question]Whether T-MAC supports mixed-precision LLM?
#38
AndreaChiChengdu
opened
2 months ago
2
where is t-mac-envs.sh
#37
idreamerhx
closed
2 months ago
4
[Fix] Try to fix ubuntu installation (#26)
#36
kaleid-liner
closed
2 months ago
1
Question about "Bit-serial linear transformation" in paper
#34
zjnyly
closed
2 months ago
1
How to get more detail information
#33
yoghur
closed
2 months ago
4
8gen3 T-MAC cpu performance issue
#32
AndreaChiChengdu
opened
2 months ago
9
llama-2-7b 4bit convert fail issue(2bit is ok)
#31
AndreaChiChengdu
closed
2 months ago
3
How to Fully Utilize the Optimized Performance of T-MAC ?
#30
ma-hang
opened
2 months ago
2
Add Android cross-compilation support (feat #12, #18)
#29
kaleid-liner
closed
2 months ago
0
OpenAI compatible chat completions endpoint
#28
maxim-saplin
opened
3 months ago
9
【Qwen】Could you please update 3rd/llama.cpp to support Qwen1.5 or Qwen2 ?
#27
tiger-of-shawn
closed
1 month ago
14
ubuntu installation problem on Google colab
#26
amirhosein-darmani
opened
3 months ago
25
Cannot run compile.py on my Apple M1Max.
#25
begoss
closed
2 months ago
5
Any plans to merge the latest code of llama.cpp?
#24
peytoncai
opened
3 months ago
21
How to build ollama with t-mac support?
#23
goukisun
opened
3 months ago
2
question about run_pipeline.py
#22
sunj0104
closed
3 months ago
5
how to add new models and their kernels?
#21
jason-zou
closed
2 months ago
1
Thanks for the excellent work, what happens when let the CPU and GPU (or other) do the inference operations at the same time.
#20
aoom
closed
1 month ago
1
The program hangs when I run the command "pip install . -v"
#19
spxcds
closed
3 months ago
1
Next