issues
search
wejoncy
/
QLLM
A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ, and export to onnx/onnx-runtime easily.
Apache License 2.0
150
stars
15
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
llama-2-7b-chat gptq quantize & onnx export fail: RuntimeError: The size of tensor a (4096) must match the size of tensor b (2) at non-singleton dimension 2
#139
lifelongeeek
opened
2 months ago
1
more Awq models && onnx kernel bug when g=-1
#138
wejoncy
closed
2 months ago
0
Qwen2 model quantization failed with "AssertionError: Qwen2ForCausalLM is not support"
#137
FlexLaughing
closed
2 months ago
3
how to get the PPL ?
#136
LiMa-cas
opened
2 months ago
1
bump to 0.2.0
#135
wejoncy
closed
3 months ago
0
support transformers-lib loading
#134
wejoncy
closed
3 months ago
0
"The model's quantization config from the arguments has no `quant_method` attribute" for OpenGVLab/OmniQuant
#133
FlexLaughing
closed
3 months ago
14
fix llama3.1
#132
wejoncy
closed
3 months ago
0
v0.1.9.1
#131
wejoncy
closed
5 months ago
0
quick fix
#130
wejoncy
closed
5 months ago
0
add macro GENERAL_TORCH to get rid of OptionalCUDAGuard
#129
wejoncy
closed
5 months ago
0
fix version match erros
#128
wejoncy
closed
5 months ago
0
Cannot load AWQ Quantized LLaMA 3 in Colab
#127
x0wllaar
closed
5 months ago
1
Unsupported model IR version: 10, max supported IR version: 9
#126
FlexLaughing
closed
4 months ago
30
Update README.md
#125
wejoncy
closed
5 months ago
0
add assert message && ci upgrade torch 2.2.2
#124
wejoncy
closed
5 months ago
0
AWQ Marlin Quantization
#123
pandirabhishek
closed
5 months ago
1
-allow-unsupported-compiler
#122
wejoncy
closed
5 months ago
0
Bump to 0.1.9
#121
wejoncy
closed
5 months ago
0
minor fix, attn_implementation
#120
wejoncy
closed
5 months ago
0
Alibaba-NLP/gte-Qwen2-7B-instruct doesn't load properly
#119
prattcmp
closed
5 months ago
4
AWQ Quantitative Model Inference Problem
#118
bg51717
closed
6 months ago
1
Problem with exporting GPTQ model to ONNX
#117
Wendy-Xiao
closed
6 months ago
6
suggestion, make quantization possible to offload to disk instead of ram
#116
nidhoggr-nil
opened
7 months ago
3
Fix typos
#115
emphasis10
closed
7 months ago
3
Fix 112
#114
wejoncy
closed
7 months ago
0
fix issue
#113
wejoncy
closed
7 months ago
0
TypeError:make mixbits quant linear()got an unexpected keyword argument 'device'
#112
bg51717
closed
7 months ago
9
bugfix
#111
wejoncy
closed
8 months ago
0
new autogptq config format && parallel load
#110
wejoncy
closed
8 months ago
0
Bump to 0.1.8
#109
wejoncy
closed
8 months ago
0
Refactor
#108
wejoncy
closed
8 months ago
0
support awq sym
#107
wejoncy
closed
8 months ago
0
support `MARLIN` pack_mode
#106
wejoncy
closed
8 months ago
0
Refactor
#105
wejoncy
closed
8 months ago
0
Onnx fix qzeros odd-shape
#104
wejoncy
closed
8 months ago
0
buf fix.
#103
wejoncy
closed
8 months ago
0
Update README.md
#102
wejoncy
closed
8 months ago
0
patch release v0.1.7.1
#101
wejoncy
closed
8 months ago
0
minor fix
#100
wejoncy
closed
8 months ago
0
minor fix and dataset speed
#99
wejoncy
closed
8 months ago
0
fix "disable win in release
#98
wejoncy
closed
8 months ago
0
refactor args
#97
wejoncy
closed
8 months ago
0
disable win in release
#96
wejoncy
closed
9 months ago
0
improve .cpu() with non_blocking
#95
wejoncy
closed
9 months ago
0
Bump to 0.1.7
#94
wejoncy
closed
9 months ago
0
support export hqq to onnx
#93
wejoncy
closed
9 months ago
0
ort ops support in main branch with act_order
#92
wejoncy
closed
9 months ago
0
I like to do 8 bits quantion of owl-vit and export it in onnx format
#91
solomonmanuelraj
closed
8 months ago
2
fix attn_implementation
#90
wejoncy
closed
10 months ago
0
Next