mit-han-lab smoothquant issues

mit-han-lab / smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

https://arxiv.org/abs/2211.10438

MIT License

1.26k stars 150 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

how to visual outier of activation

#97 harleyszhang opened 3 weeks ago
0
The upper and lower bounds seems to not 8 bits in some cases

#96 zhangyu68 opened 3 weeks ago
0
Why only 4 layers?

#95 VincentXWD opened 2 months ago
0
Support for Qwen2

#94 JiaXinLI98 opened 3 months ago
0
How to quantize the out_proj and fc2 module in OPT model family

#93 yanchenmochen opened 3 months ago
0
How to quantize llama3?

#92 jpyo0803 opened 4 months ago
0
export_int8_model.py size issue

#91 ljhyeok123 opened 4 months ago
1
quantify other models,

#90 AlexMa0 opened 4 months ago
0
best Alpha value for Qwen 1.5 72B

#89 Riskin1999 opened 5 months ago
0
how to draw this result directly? is there any script?

#88 foreverpiano opened 5 months ago
1
Huggingface_Hub Issue

#87 faize5 opened 6 months ago
2
Can SmoothQuant be used on ViT models?

#86 n9s8a opened 7 months ago
0
Whether it can be supported stable diffusion

#85 songh11 opened 7 months ago
0
Inquiry about Int8 BMM overflow

#84 luzai opened 7 months ago
0
Error when running smoothquant_opt_real_int8_demo.ipynb

#83 kaijun924 opened 7 months ago
0
how to use model.generate with smoothquant models

#82 Hao-YunDeng opened 7 months ago
0
which version of transformer and datasets package do we need for this repo?

#81 ghost opened 8 months ago
2
adjust activations

#80 muzi0111 opened 8 months ago
0
Question: why not need explicit scaling for activation X

#79 ghost opened 8 months ago
2
RuntimeError: "clamp_min_cpu" not implemented for 'Half'

#78 ghost closed 8 months ago
1
Weight migration for Llama?

#77 atyshka opened 8 months ago
0
Question about code

#76 Lucky-Lance opened 8 months ago
0
How Can I Peft the Smoothquanted LLM?

#75 LameloBally opened 8 months ago
1
bmm_s8t_s8n_s8t cannot run with this shape

#74 xiachong94 closed 8 months ago
0
Can I reproduce SmoothQuant on CPU only since I see that torch-int8 requires a GPU, and I am only interested in inference on the CPU?

#73 WCSY-YG opened 9 months ago
0
set quantize_output True the acc drop to 0

#72 lonleyodd opened 10 months ago
0
ask for a function in linear.py for smoothquant in llama @Anizpz

#71 msz12345 opened 10 months ago
0
w8a8 Does it require dequantization during forward inference?

#70 shatealaboxiaowang opened 11 months ago
1
general question about SmoothQuant kv-cache quantization

#69 brisker opened 11 months ago
1
Got accuray=0 when trying _real_int8_demo.ipynb

#68 leocnj opened 11 months ago
0
how to reproduce ppl of wikitext2?

#67 Arthur-Ling opened 11 months ago
1
Activation scales for bloomz 7.1b

#66 bil-ash opened 11 months ago
1
support auto search for per-layer smoothing alphas, and auto clip for weights, both bits-aware, can do W4A8 with minor loss

#65 yyfcc17 closed 11 months ago
2
What does the accuracy in Figure 7 of the paper mean?

#64 YundongGai opened 12 months ago
0
Demo code for Bloom model?

#63 llCurious opened 12 months ago
0
Inference time decreases only by 7.5% on opt-6.7B

#62 FurryMushroom opened 1 year ago
1
llama-2-chat demo

#61 liquanfeng closed 11 months ago
0
pickle.UnpicklingError: invalid load key, 'v'.

#60 baiSongL opened 1 year ago
2
failed to run int8 opt

#59 jackzhou121 closed 1 year ago
2
UnpicklingError: invalid load key, 'v'.

#58 FurryMushroom closed 12 months ago
7
add llama model support

#57 AniZpZ opened 1 year ago
1
which is faster between smoothquant and autogptq?

#56 InkdyeHuang opened 1 year ago
0
[BUG] Int8 inference with torch-int encounter errors

#55 WelY1 opened 1 year ago
0
How to calculate Alpha?

#54 Triple-L opened 1 year ago
0
Why do different models have the same size？

#53 WelY1 opened 1 year ago
0
Activation Channel Scales and Calibration

#52 520zw opened 1 year ago
1
The ppl value of the opt-6.7b-smoothquant model shows abnormal performance

#51 sitabulaixizawaluduo opened 1 year ago
1
circular import

#50 breaddance opened 1 year ago
0
Can you explain in a step by step manner how we can implement this on our own model and dataset?

#49 shahaamirbader opened 1 year ago
0
How to reproduce the performance described in the paper

#48 rolex-cjj opened 1 year ago
2