issues
search
intel
/
auto-round
Advanced Quantization Algorithm for LLMs/VLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
https://arxiv.org/abs/2309.05516
Apache License 2.0
261
stars
22
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
add gpu ut and manually test before test
#359
wenhuach21
opened
3 hours ago
0
fix awq exporting
#358
wenhuach21
opened
3 hours ago
0
export to awq exception due to transformers 4.7 API change
#357
wenhuach21
opened
17 hours ago
0
add QWQ-32B, VLM, Qwen2.5, Llama3.1 int4 models
#356
wenhuach21
closed
16 hours ago
0
full range sym or autoround is not stable for llama3.1 70B
#355
wenhuach21
opened
1 day ago
0
delete llm example and refine readme
#354
wenhuach21
closed
2 days ago
0
[low priority]vlm export to autoawq
#353
wenhuach21
opened
2 days ago
0
consecutive quantization for the same model with different config bug
#352
wenhuach21
opened
2 days ago
0
Unexpected behavior of only_text_test check due to inference issue on cpu
#351
WeiweiZhang1
opened
2 days ago
0
bump version into v0.4.1
#350
XuehaoSun
closed
2 days ago
0
[Critical Bug]API use sym as default
#349
wenhuach21
closed
2 days ago
0
triton backend requires< 3.0
#348
wenhuach21
closed
2 days ago
0
Update docker user and remove baseline UT
#347
XuehaoSun
closed
2 days ago
0
[HPU]Enhance installation check
#346
yiliu30
closed
2 days ago
0
[HPU]Enhance `numba` check
#345
yiliu30
closed
3 days ago
0
[VLM]fix bs and grad reset
#344
n1ck-guo
closed
2 days ago
0
vllm/llama-vision llava calibration infinite loop fix
#343
WeiweiZhang1
closed
3 days ago
1
fix typo
#342
wenhuach21
closed
5 days ago
0
bump version into v0.4
#341
XuehaoSun
closed
1 week ago
0
add cogvlm recipe and refine readme
#340
WeiweiZhang1
closed
1 week ago
0
cogvlm doc
#339
n1ck-guo
closed
1 week ago
0
add qwen2.5 recipe and refine readme
#338
WeiweiZhang1
closed
1 week ago
0
Exclude markdown files from unit test pipelines
#337
XuehaoSun
closed
1 week ago
0
refine mllm docs
#336
WeiweiZhang1
closed
1 week ago
0
fix model_dtype issue and reformat mllm code
#335
wenhuach21
closed
1 week ago
0
refine mllm API and add help info
#334
n1ck-guo
closed
1 week ago
0
add tips and tricks for llm&mllm quantization
#333
wenhuach21
closed
1 week ago
0
fix eval_bs in fake format and reset auto-gptq exporting max_shard_size
#332
wenhuach21
closed
1 week ago
0
Simulated W4Afp8 Quantization
#331
wenhuach21
closed
1 day ago
0
Increase unit test timeout to 120 minutes
#330
XuehaoSun
closed
1 week ago
0
fix mllm dataset config bug and add gptq cuda backend
#329
wenhuach21
closed
1 week ago
1
fix the bug of test model support for test-only
#328
n1ck-guo
closed
1 week ago
1
set default mllm dataset
#327
n1ck-guo
closed
1 week ago
1
fix fp_layers issue and force to FP16 on cuda for autoround format inference
#326
wenhuach21
closed
1 week ago
0
fix autogptq version error
#325
wenhuach21
closed
2 weeks ago
0
add warning for no gptq exllamav2 kernel
#324
wenhuach21
closed
2 weeks ago
0
Some models like Qwen overflow/downflow with CUDA kernel
#323
wenhuach21
closed
1 week ago
2
add pile calib, rename quant_block_list to to_quant_block_names
#322
WeiweiZhang1
closed
2 weeks ago
2
fix multiple device bug
#321
wenhuach21
closed
2 weeks ago
0
Revert "Update the check for HPU"
#320
yiliu30
closed
2 weeks ago
1
fix eval device issue
#319
wenhuach21
closed
2 weeks ago
0
Update the check for HPU
#318
yiliu30
closed
2 weeks ago
1
new mllm eval
#317
n1ck-guo
closed
2 weeks ago
0
fix merge error
#316
n1ck-guo
closed
2 weeks ago
1
Add cpu only version
#315
XuehaoSun
closed
2 weeks ago
0
add torch compile arg
#314
wenhuach21
closed
2 weeks ago
0
qwen2 vision quantization bugfix
#313
WeiweiZhang1
closed
3 weeks ago
0
multiple gpu evaluation/calibration refine
#312
wenhuach21
closed
3 weeks ago
0
fix typo
#311
wenhuach21
closed
3 weeks ago
0
Update autogptq exporting
#310
wenhuach21
closed
3 weeks ago
0
Next