issues
search
mit-han-lab
/
TinyChatEngine
TinyChatEngine: On-Device LLM Inference Library
https://mit-han-lab.github.io/TinyChatEngine/
MIT License
634
stars
59
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Accelerate inference on Intel and M1 with int8 activation quantization
#14
meenchen
closed
11 months ago
0
Fix order of weights for our cuda kernel
#13
meenchen
closed
12 months ago
0
debug wip
#12
meenchen
closed
11 months ago
0
Readme minor update
#11
meenchen
closed
12 months ago
0
Minimal demo tutorial
#10
meenchen
closed
12 months ago
0
Fix quantizer and update demo parameters and example for smoothed model.
#9
meenchen
closed
1 year ago
0
refactor model exporter and upload smoothed model
#8
meenchen
closed
1 year ago
0
Support w4a16 with CUDA GPU
#7
meenchen
closed
10 months ago
0
Fix profile mode and minor fix of metal kernel
#6
meenchen
closed
1 year ago
0
m1 metal for GPU inference
#5
meenchen
closed
1 year ago
0
Int4 LLaMA runtime and reference matmul
#4
meenchen
closed
1 year ago
0
int4 Intel/M1 kernels
#3
meenchen
closed
1 year ago
0
Cleanup tests and assets
#2
meenchen
closed
1 year ago
0
LLaMA runtime support
#1
meenchen
closed
1 year ago
0
Previous