issues
search
mit-han-lab
/
smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
https://arxiv.org/abs/2211.10438
MIT License
1.26k
stars
150
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
how to visual outier of activation
#97
harleyszhang
opened
3 weeks ago
0
The upper and lower bounds seems to not 8 bits in some cases
#96
zhangyu68
opened
3 weeks ago
0
Why only 4 layers?
#95
VincentXWD
opened
2 months ago
0
Support for Qwen2
#94
JiaXinLI98
opened
3 months ago
0
How to quantize the out_proj and fc2 module in OPT model family
#93
yanchenmochen
opened
3 months ago
0
How to quantize llama3?
#92
jpyo0803
opened
4 months ago
0
export_int8_model.py size issue
#91
ljhyeok123
opened
4 months ago
1
quantify other models,
#90
AlexMa0
opened
4 months ago
0
best Alpha value for Qwen 1.5 72B
#89
Riskin1999
opened
5 months ago
0
how to draw this result directly? is there any script?
#88
foreverpiano
opened
5 months ago
1
Huggingface_Hub Issue
#87
faize5
opened
6 months ago
2
Can SmoothQuant be used on ViT models?
#86
n9s8a
opened
7 months ago
0
Whether it can be supported stable diffusion
#85
songh11
opened
7 months ago
0
Inquiry about Int8 BMM overflow
#84
luzai
opened
7 months ago
0
Error when running smoothquant_opt_real_int8_demo.ipynb
#83
kaijun924
opened
7 months ago
0
how to use model.generate with smoothquant models
#82
Hao-YunDeng
opened
7 months ago
0
which version of transformer and datasets package do we need for this repo?
#81
ghost
opened
8 months ago
2
adjust activations
#80
muzi0111
opened
8 months ago
0
Question: why not need explicit scaling for activation X
#79
ghost
opened
8 months ago
2
RuntimeError: "clamp_min_cpu" not implemented for 'Half'
#78
ghost
closed
8 months ago
1
Weight migration for Llama?
#77
atyshka
opened
8 months ago
0
Question about code
#76
Lucky-Lance
opened
8 months ago
0
How Can I Peft the Smoothquanted LLM?
#75
LameloBally
opened
8 months ago
1
bmm_s8t_s8n_s8t cannot run with this shape
#74
xiachong94
closed
8 months ago
0
Can I reproduce SmoothQuant on CPU only since I see that torch-int8 requires a GPU, and I am only interested in inference on the CPU?
#73
WCSY-YG
opened
9 months ago
0
set quantize_output True the acc drop to 0
#72
lonleyodd
opened
10 months ago
0
ask for a function in linear.py for smoothquant in llama @Anizpz
#71
msz12345
opened
10 months ago
0
w8a8 Does it require dequantization during forward inference?
#70
shatealaboxiaowang
opened
11 months ago
1
general question about SmoothQuant kv-cache quantization
#69
brisker
opened
11 months ago
1
Got accuray=0 when trying _real_int8_demo.ipynb
#68
leocnj
opened
11 months ago
0
how to reproduce ppl of wikitext2?
#67
Arthur-Ling
opened
11 months ago
1
Activation scales for bloomz 7.1b
#66
bil-ash
opened
11 months ago
1
support auto search for per-layer smoothing alphas, and auto clip for weights, both bits-aware, can do W4A8 with minor loss
#65
yyfcc17
closed
11 months ago
2
What does the accuracy in Figure 7 of the paper mean?
#64
YundongGai
opened
12 months ago
0
Demo code for Bloom model?
#63
llCurious
opened
12 months ago
0
Inference time decreases only by 7.5% on opt-6.7B
#62
FurryMushroom
opened
1 year ago
1
llama-2-chat demo
#61
liquanfeng
closed
11 months ago
0
pickle.UnpicklingError: invalid load key, 'v'.
#60
baiSongL
opened
1 year ago
2
failed to run int8 opt
#59
jackzhou121
closed
1 year ago
2
UnpicklingError: invalid load key, 'v'.
#58
FurryMushroom
closed
12 months ago
7
add llama model support
#57
AniZpZ
opened
1 year ago
1
which is faster between smoothquant and autogptq?
#56
InkdyeHuang
opened
1 year ago
0
[BUG] Int8 inference with torch-int encounter errors
#55
WelY1
opened
1 year ago
0
How to calculate Alpha?
#54
Triple-L
opened
1 year ago
0
Why do different models have the same size?
#53
WelY1
opened
1 year ago
0
Activation Channel Scales and Calibration
#52
520zw
opened
1 year ago
1
The ppl value of the opt-6.7b-smoothquant model shows abnormal performance
#51
sitabulaixizawaluduo
opened
1 year ago
1
circular import
#50
breaddance
opened
1 year ago
0
Can you explain in a step by step manner how we can implement this on our own model and dataset?
#49
shahaamirbader
opened
1 year ago
0
How to reproduce the performance described in the paper
#48
rolex-cjj
opened
1 year ago
2
Next