issues
search
mit-han-lab
/
TinyChatEngine
TinyChatEngine: On-Device LLM Inference Library
https://mit-han-lab.github.io/TinyChatEngine/
MIT License
760
stars
73
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
update chatbot UI
#21
meenchen
closed
1 year ago
0
Minor clean up and fix avx reference imp.
#20
meenchen
closed
1 year ago
0
clean avx kernel
#19
meenchen
closed
1 year ago
0
Add a gif for README
#18
meenchen
closed
1 year ago
0
Add llama2 and clean up codebase
#17
meenchen
closed
1 year ago
0
Changes to support windows
#16
meenchen
closed
1 year ago
0
Support torch weight dict with Model exporter
#15
meenchen
closed
1 year ago
0
Accelerate inference on Intel and M1 with int8 activation quantization
#14
meenchen
closed
1 year ago
0
Fix order of weights for our cuda kernel
#13
meenchen
closed
1 year ago
0
debug wip
#12
meenchen
closed
1 year ago
0
Readme minor update
#11
meenchen
closed
1 year ago
0
Minimal demo tutorial
#10
meenchen
closed
1 year ago
0
Fix quantizer and update demo parameters and example for smoothed model.
#9
meenchen
closed
1 year ago
0
refactor model exporter and upload smoothed model
#8
meenchen
closed
1 year ago
0
Support w4a16 with CUDA GPU
#7
meenchen
closed
1 year ago
0
Fix profile mode and minor fix of metal kernel
#6
meenchen
closed
1 year ago
0
m1 metal for GPU inference
#5
meenchen
closed
1 year ago
0
Int4 LLaMA runtime and reference matmul
#4
meenchen
closed
1 year ago
0
int4 Intel/M1 kernels
#3
meenchen
closed
1 year ago
0
Cleanup tests and assets
#2
meenchen
closed
1 year ago
0
LLaMA runtime support
#1
meenchen
closed
1 year ago
0
Previous