mit-han-lab llm-awq issues

mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

MIT License

2.55k stars 207 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Don`t work on CPU "Unable to get JIT kernel for brgemm"

#241 andretisch opened 2 days ago
0
Inquiry about GPU memory usage of VILA 1.5-3b AWQ model for 12 frames video.

#240 gj-raza opened 1 week ago
0
Replace FasterTransformers like KV cache layout and kernel with flash attention for better support for longer sequence

#239 JerryGJX opened 1 week ago
0
RuntimeError: CUDA error: no kernel image is available for execution on the device

#238 new-Sunset-shimmer opened 1 week ago
0
Could you explain me how can I change the percentage of kept salient weights in FP16?

#237 akylbekmaxutov opened 1 week ago
0
Cannot clone from Efficient-Large-Model/VILA.git, Dependency Issues with alternative

#236 rossgreer opened 1 week ago
0
[QST] Why does awq write its own int3/int4 GEMM kernels instead of using CUTLASS

#235 SimpleTheoryOfTypes opened 2 weeks ago
0
Unable to run Gradio demo: VILA with TinyChat on a local GPU server

#234 mitraavi opened 3 weeks ago
0
Support for llava_next Architecture in LLM-AWQ (Issue with Quantizing llava-hf/llava-v1.6-mistral-7b-hf)

#233 ShobhaRajanna opened 3 weeks ago
0
How to convert the AWQ model after the quantization into safetensors

#232 CCRss opened 3 weeks ago
0
Regarding the issues encountered with w_bit 3 quantification

#231 langxinspieder opened 3 weeks ago
1
About the use of calibration sets

#230 langxinspieder opened 3 weeks ago
0
Questions on the AWQ

#229 suhcrates-web opened 1 month ago
0
awq_inference_engine.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c1021throwNullDataPtrErrorEv

#228 langxinspieder closed 3 weeks ago
1
No video inference code

#227 Closertodeath opened 1 month ago
0
怎么将生成的.pt文件与模型结构合并，转换为其他结构

#226 gdgfd22 opened 1 month ago
1
AutoModelForSequenceClassification模型量化

#225 Fenglly opened 1 month ago
0
Update news for chunk prefilling

#224 ys-2020 closed 1 month ago
0
Feature 'ldmatrix' requires target sm_75 or higher when building awq_inference_engine on Tesla V100

#223 ShobhaRajanna closed 1 month ago
0
AttributeError: 'LlamaConfig' object has no attribute 'rope_theta'

#222 lvtao65535 opened 1 month ago
1
How to Split AWQ Weights?

#221 Azure-Tang opened 1 month ago
0
Unsupported NVHPC compiler found. nvc++ is the only NVHPC compiler

#220 SimWangArizona opened 2 months ago
0
"Expected all tensors to be on the same device" when running "Perform AWQ search" on Llama3

#219 charlesyju opened 2 months ago
0
Flashattn and multiround chat

#218 Louym closed 1 month ago
5
About the implementation of scaled activation

#217 XcloudFance opened 3 months ago
0
Batch Processing not implemented for LlavaStreamGenerator

#216 rahulthakur319 opened 3 months ago
0
quantizing model

#215 xudawu201 closed 3 months ago
0
NotImplementedError: <class 'transformers_modules.modeling_chatglm.ChatGLMForConditionalGeneration'>

#214 lihaofd opened 3 months ago
0
GGUF export support / CPU inference

#213 TomekPro opened 3 months ago
0
INT 3 support

#212 DavidePaglieri closed 3 months ago
1
Perplexity results for OPT models (125M–30B) on the WikiText2 dataset?

#211 seannz closed 4 months ago
0
How to generate awq quantized model for llava-1.5-7b-hf

#210 XiaotaoChen opened 4 months ago
1
Plans for running model on other devices?

#209 stats-202 opened 4 months ago
0
Update Helpful links

#208 ys-2020 closed 4 months ago
0
Update impact section

#207 kentang-mit closed 4 months ago
0
Update README.md

#206 kentang-mit closed 4 months ago
0
Update README.md

#205 kentang-mit closed 4 months ago
0
Add support for GPUs with compute capability lower than 8.0 for awq/kernels installation

#204 rahulthakur319 opened 4 months ago
1
How to load and infer the VILA-1.5-40B-AWQ model on multiple GPUs? I currently have 4 A30✖️24GB GPUs and a cuda out of memory error occurs.

#203 changqinyao opened 5 months ago
0
显卡要求

#202 kplxwb opened 5 months ago
0
Fix illegal memory access of GEMV kernel

#201 xutianming opened 5 months ago
0
Illegal memory access for LLama-3-70B

#200 pprp opened 5 months ago
0
Request for Semi-Structured Sparse Matrix Support in AWQ Kernel

#199 pprp opened 5 months ago
0
Invalid Compute Capability when building Docker pytorch:23.12

#198 razpa closed 5 months ago
1
[Minor] Update VILA URL

#197 ys-2020 closed 5 months ago
0
Memory increases significantly during inference

#196 xpq-tech opened 5 months ago
0
Invalid Characters

#195 YandongJi opened 5 months ago
0
Rocm support request

#194 Wintoplay opened 5 months ago
0
Is this a bug for the quantization phase?

#193 sleepwalker2017 opened 6 months ago
1
google.protobuf.message.DecodeError: Error parsing message

#192 InkyuPak opened 6 months ago
1