issues
search
mit-han-lab
/
llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
MIT License
2.07k
stars
150
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
How to load and infer the VILA-1.5-40B-AWQ model on multiple GPUs? I currently have 4 A30✖️24GB GPUs and a cuda out of memory error occurs.
#203
changqinyao
opened
4 days ago
0
显卡要求
#202
kplxwb
opened
1 week ago
0
Fix illegal memory access of GEMV kernel
#201
xutianming
opened
2 weeks ago
0
Illegal memory access for LLama-3-70B
#200
pprp
opened
2 weeks ago
0
Request for Semi-Structured Sparse Matrix Support in AWQ Kernel
#199
pprp
opened
3 weeks ago
0
Invalid Compute Capability when building Docker pytorch:23.12
#198
razpa
closed
2 weeks ago
1
[Minor] Update VILA URL
#197
ys-2020
closed
3 weeks ago
0
Memory increases significantly during inference
#196
xpq-tech
opened
4 weeks ago
0
Invalid Characters
#195
YandongJi
opened
4 weeks ago
0
Rocm support request
#194
Wintoplay
opened
1 month ago
0
Is this a bug for the quantization phase?
#193
sleepwalker2017
opened
1 month ago
1
google.protobuf.message.DecodeError: Error parsing message
#192
InkyuPak
opened
1 month ago
1
AWQ and VILA dependency compatible issue
#191
chaifong92
opened
1 month ago
2
Can you provide examples code to run inference on video QA?
#190
rebuttalpapers
opened
1 month ago
1
AWQ kernel Issue
#189
KThyo
opened
1 month ago
0
openAI-compatible tinychat API?
#188
DiTo97
opened
1 month ago
0
how to support to custom module like mla in deep-seek-v2
#187
robert-lee2016
opened
1 month ago
0
tinychat.serve.model_worker_new.py AWQ model in training mode
#186
NigelNelson
opened
1 month ago
0
[Minor] Update News
#185
ys-2020
closed
1 month ago
0
No such file or directory: "VILA1.5-13b-AWQ/llm/model-00001-of-00006.safetensors"
#184
kousun12
opened
1 month ago
7
Add phi3 support
#183
pprp
opened
1 month ago
4
upload model_worker_new.py
#182
ys-2020
closed
1 month ago
0
No module named 'awq_inference_engine'
#181
Alpslee
opened
1 month ago
2
VILA1.5 launch
#180
kentang-mit
closed
1 month ago
0
Update TinyChat README.
#179
ys-2020
closed
2 months ago
0
Error while Quantizing OWLv2 model
#178
n9s8a
opened
2 months ago
0
Support Llama3 and update on-the-fly rope scaling
#177
kentang-mit
closed
2 months ago
0
AWQ for non-Transfomer Implementation
#176
satabios
opened
2 months ago
3
Support for Qwen models
#175
Huyueeer
opened
2 months ago
2
Add Mistral & Mixtral support
#174
Sakits
opened
2 months ago
0
awq_inference_engine is missing from source, so quantizing custom models fails
#173
RDouglasSharp
closed
2 months ago
2
can awq support 3-bit,2-bit, 8-bit quantization?
#172
ArlanCooper
opened
2 months ago
1
Grok-1 AWQ
#171
jjovalle99
opened
2 months ago
0
illegal memory access when input tokens < 8
#170
casper-hansen
opened
3 months ago
0
Weight Packing Format
#169
jeromeku
opened
3 months ago
0
[Minor] Update README.
#168
ys-2020
closed
3 months ago
0
when q-group-size = -1,the code will not run
#167
ZhengHSI
opened
3 months ago
0
Inquiry about Minimum GPU Requirements
#166
loiqy
opened
3 months ago
1
Out of memory in Jetson Orin NX 8GB
#165
pprp
opened
3 months ago
0
AWQ for non-transformer layers?
#164
satabios
opened
3 months ago
0
Possible Bug in "_search_module_scale" Function
#163
satabios
opened
3 months ago
0
Weight int4 quantization, but actually it is int16
#162
dongxuemin666
opened
3 months ago
4
SM_75 (Turing) Support
#161
28Smiles
opened
3 months ago
0
Error while generating real quantized weights for VILA
#160
ocg2347
opened
3 months ago
0
KeyError: 'llava_llama'
#159
huzicong
opened
3 months ago
1
run_awq.<locals>.Catcher.forward() error
#158
chunniunai220ml
opened
3 months ago
0
Use awq to quantize Deepseek-coder-33B-instruct model
#157
CarolXh
opened
3 months ago
0
Update pyproject.toml
#156
Louym
closed
3 months ago
2
RuntimeError: Unknown Layout in CUDA Kernel Execution
#155
fantasysee
opened
3 months ago
0
reproduce Llama2 7b failure : RuntimeError: The expanded size of the tensor (4608) must match the existing size (4096) at non-singleton dimension 3. Target sizes: [65, 32, 512, 4608]. Tensor sizes: [65, 1, 512, 4096]
#154
tuanhe
opened
3 months ago
3
Next