issues
search
mit-han-lab
/
llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
MIT License
2.55k
stars
207
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Don`t work on CPU "Unable to get JIT kernel for brgemm"
#241
andretisch
opened
2 days ago
0
Inquiry about GPU memory usage of VILA 1.5-3b AWQ model for 12 frames video.
#240
gj-raza
opened
1 week ago
0
Replace FasterTransformers like KV cache layout and kernel with flash attention for better support for longer sequence
#239
JerryGJX
opened
1 week ago
0
RuntimeError: CUDA error: no kernel image is available for execution on the device
#238
new-Sunset-shimmer
opened
1 week ago
0
Could you explain me how can I change the percentage of kept salient weights in FP16?
#237
akylbekmaxutov
opened
1 week ago
0
Cannot clone from Efficient-Large-Model/VILA.git, Dependency Issues with alternative
#236
rossgreer
opened
1 week ago
0
[QST] Why does awq write its own int3/int4 GEMM kernels instead of using CUTLASS
#235
SimpleTheoryOfTypes
opened
2 weeks ago
0
Unable to run Gradio demo: VILA with TinyChat on a local GPU server
#234
mitraavi
opened
3 weeks ago
0
Support for llava_next Architecture in LLM-AWQ (Issue with Quantizing llava-hf/llava-v1.6-mistral-7b-hf)
#233
ShobhaRajanna
opened
3 weeks ago
0
How to convert the AWQ model after the quantization into safetensors
#232
CCRss
opened
3 weeks ago
0
Regarding the issues encountered with w_bit 3 quantification
#231
langxinspieder
opened
3 weeks ago
1
About the use of calibration sets
#230
langxinspieder
opened
3 weeks ago
0
Questions on the AWQ
#229
suhcrates-web
opened
1 month ago
0
awq_inference_engine.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c1021throwNullDataPtrErrorEv
#228
langxinspieder
closed
3 weeks ago
1
No video inference code
#227
Closertodeath
opened
1 month ago
0
怎么将生成的.pt文件与模型结构合并,转换为其他结构
#226
gdgfd22
opened
1 month ago
1
AutoModelForSequenceClassification模型量化
#225
Fenglly
opened
1 month ago
0
Update news for chunk prefilling
#224
ys-2020
closed
1 month ago
0
Feature 'ldmatrix' requires target sm_75 or higher when building awq_inference_engine on Tesla V100
#223
ShobhaRajanna
closed
1 month ago
0
AttributeError: 'LlamaConfig' object has no attribute 'rope_theta'
#222
lvtao65535
opened
1 month ago
1
How to Split AWQ Weights?
#221
Azure-Tang
opened
1 month ago
0
Unsupported NVHPC compiler found. nvc++ is the only NVHPC compiler
#220
SimWangArizona
opened
2 months ago
0
"Expected all tensors to be on the same device" when running "Perform AWQ search" on Llama3
#219
charlesyju
opened
2 months ago
0
Flashattn and multiround chat
#218
Louym
closed
1 month ago
5
About the implementation of scaled activation
#217
XcloudFance
opened
3 months ago
0
Batch Processing not implemented for LlavaStreamGenerator
#216
rahulthakur319
opened
3 months ago
0
quantizing model
#215
xudawu201
closed
3 months ago
0
NotImplementedError: <class 'transformers_modules.modeling_chatglm.ChatGLMForConditionalGeneration'>
#214
lihaofd
opened
3 months ago
0
GGUF export support / CPU inference
#213
TomekPro
opened
3 months ago
0
INT 3 support
#212
DavidePaglieri
closed
3 months ago
1
Perplexity results for OPT models (125M–30B) on the WikiText2 dataset?
#211
seannz
closed
4 months ago
0
How to generate awq quantized model for llava-1.5-7b-hf
#210
XiaotaoChen
opened
4 months ago
1
Plans for running model on other devices?
#209
stats-202
opened
4 months ago
0
Update Helpful links
#208
ys-2020
closed
4 months ago
0
Update impact section
#207
kentang-mit
closed
4 months ago
0
Update README.md
#206
kentang-mit
closed
4 months ago
0
Update README.md
#205
kentang-mit
closed
4 months ago
0
Add support for GPUs with compute capability lower than 8.0 for awq/kernels installation
#204
rahulthakur319
opened
4 months ago
1
How to load and infer the VILA-1.5-40B-AWQ model on multiple GPUs? I currently have 4 A30✖️24GB GPUs and a cuda out of memory error occurs.
#203
changqinyao
opened
5 months ago
0
显卡要求
#202
kplxwb
opened
5 months ago
0
Fix illegal memory access of GEMV kernel
#201
xutianming
opened
5 months ago
0
Illegal memory access for LLama-3-70B
#200
pprp
opened
5 months ago
0
Request for Semi-Structured Sparse Matrix Support in AWQ Kernel
#199
pprp
opened
5 months ago
0
Invalid Compute Capability when building Docker pytorch:23.12
#198
razpa
closed
5 months ago
1
[Minor] Update VILA URL
#197
ys-2020
closed
5 months ago
0
Memory increases significantly during inference
#196
xpq-tech
opened
5 months ago
0
Invalid Characters
#195
YandongJi
opened
5 months ago
0
Rocm support request
#194
Wintoplay
opened
5 months ago
0
Is this a bug for the quantization phase?
#193
sleepwalker2017
opened
6 months ago
1
google.protobuf.message.DecodeError: Error parsing message
#192
InkyuPak
opened
6 months ago
1
Next