wejoncy QLLM issues - Githubissues

wejoncy / QLLM

A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ, and export to onnx/onnx-runtime easily.

Apache License 2.0

150 stars 15 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

llama-2-7b-chat gptq quantize & onnx export fail: RuntimeError: The size of tensor a (4096) must match the size of tensor b (2) at non-singleton dimension 2

#139 lifelongeeek opened 2 months ago
1
more Awq models && onnx kernel bug when g=-1

#138 wejoncy closed 2 months ago
0
Qwen2 model quantization failed with "AssertionError: Qwen2ForCausalLM is not support"

#137 FlexLaughing closed 2 months ago
3
how to get the PPL ?

#136 LiMa-cas opened 2 months ago
1
bump to 0.2.0

#135 wejoncy closed 3 months ago
0
support transformers-lib loading

#134 wejoncy closed 3 months ago
0
"The model's quantization config from the arguments has no `quant_method` attribute" for OpenGVLab/OmniQuant

#133 FlexLaughing closed 3 months ago
14
fix llama3.1

#132 wejoncy closed 3 months ago
0
v0.1.9.1

#131 wejoncy closed 5 months ago
0
quick fix

#130 wejoncy closed 5 months ago
0
add macro GENERAL_TORCH to get rid of OptionalCUDAGuard

#129 wejoncy closed 5 months ago
0
fix version match erros

#128 wejoncy closed 5 months ago
0
Cannot load AWQ Quantized LLaMA 3 in Colab

#127 x0wllaar closed 5 months ago
1
Unsupported model IR version: 10, max supported IR version: 9

#126 FlexLaughing closed 4 months ago
30
Update README.md

#125 wejoncy closed 5 months ago
0
add assert message && ci upgrade torch 2.2.2

#124 wejoncy closed 5 months ago
0
AWQ Marlin Quantization

#123 pandirabhishek closed 5 months ago
1
-allow-unsupported-compiler

#122 wejoncy closed 5 months ago
0
Bump to 0.1.9

#121 wejoncy closed 5 months ago
0
minor fix, attn_implementation

#120 wejoncy closed 5 months ago
0
Alibaba-NLP/gte-Qwen2-7B-instruct doesn't load properly

#119 prattcmp closed 5 months ago
4
AWQ Quantitative Model Inference Problem

#118 bg51717 closed 6 months ago
1
Problem with exporting GPTQ model to ONNX

#117 Wendy-Xiao closed 6 months ago
6
suggestion, make quantization possible to offload to disk instead of ram

#116 nidhoggr-nil opened 7 months ago
3
Fix typos

#115 emphasis10 closed 7 months ago
3
Fix 112

#114 wejoncy closed 7 months ago
0
fix issue

#113 wejoncy closed 7 months ago
0
TypeError:make mixbits quant linear()got an unexpected keyword argument 'device'

#112 bg51717 closed 7 months ago
9
bugfix

#111 wejoncy closed 8 months ago
0
new autogptq config format && parallel load

#110 wejoncy closed 8 months ago
0
Bump to 0.1.8

#109 wejoncy closed 8 months ago
0
Refactor

#108 wejoncy closed 8 months ago
0
support awq sym

#107 wejoncy closed 8 months ago
0
support `MARLIN` pack_mode

#106 wejoncy closed 8 months ago
0
Refactor

#105 wejoncy closed 8 months ago
0
Onnx fix qzeros odd-shape

#104 wejoncy closed 8 months ago
0
buf fix.

#103 wejoncy closed 8 months ago
0
Update README.md

#102 wejoncy closed 8 months ago
0
patch release v0.1.7.1

#101 wejoncy closed 8 months ago
0
minor fix

#100 wejoncy closed 8 months ago
0
minor fix and dataset speed

#99 wejoncy closed 8 months ago
0
fix "disable win in release

#98 wejoncy closed 8 months ago
0
refactor args

#97 wejoncy closed 8 months ago
0
disable win in release

#96 wejoncy closed 9 months ago
0
improve .cpu() with non_blocking

#95 wejoncy closed 9 months ago
0
Bump to 0.1.7

#94 wejoncy closed 9 months ago
0
support export hqq to onnx

#93 wejoncy closed 9 months ago
0
ort ops support in main branch with act_order

#92 wejoncy closed 9 months ago
0
I like to do 8 bits quantion of owl-vit and export it in onnx format

#91 solomonmanuelraj closed 8 months ago
2
fix attn_implementation

#90 wejoncy closed 10 months ago
0