qwopqwop200 GPTQ-for-LLaMa issues

qwopqwop200 / GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ

Apache License 2.0

2.99k stars 459 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Porting GPTQ to CPU?

#240 yiliu30 opened 1 year ago
2
AttributeError: module 'torch.nn.functional' has no attribute 'scaled_dot_product_attention'

#239 leszekhanusz closed 1 year ago
2
adding missing transformers import to opt.py old-cuda

#238 YellowRoseCx closed 1 year ago
0
adding missing transformers import to opt.py cuda

#237 YellowRoseCx closed 1 year ago
0
6-bit quantization

#236 philipturner opened 1 year ago
1
Add -O3 flag to nvcc

#235 Noir-Lime closed 1 year ago
1
Giepeto

#234 IsaacGanon closed 1 year ago
0
fastest-inference-4bit fails to build

#233 lee-b closed 1 year ago
3
no module named quant_cuda (fastest-inference-4bit branch)

#232 joshlevy89 opened 1 year ago
1
Benchmark broken on H100

#231 FrederikAbitz opened 1 year ago
0
question about the zero_point

#230 irasin opened 1 year ago
0
running on old gpu with fp32 only

#229 DeoLeung opened 1 year ago
3
How to inference llama-65b-4bit on mulgpu

#228 Minami-su closed 1 year ago
6
Result with the branch `fastest-inference-4bit`

#227 alanxmay closed 1 year ago
11
where to get /path/to/downloaded/llama/weights

#226 SeekPoint opened 1 year ago
0
About the fine-grained of weight quantization

#225 xingyueye opened 1 year ago
0
OpenCL support

#224 apcameron opened 1 year ago
1
Bump protobuf from 3.20.0 to 3.20.2

#223 dependabot[bot] closed 1 year ago
0
update to protobuf version used in the tokenizer

#222 openloop closed 1 year ago
0
Better, faster, smaller rotary embedding implementation in Triton.

#221 aljungberg closed 1 year ago
1
Errors to compile with CUDA 12.1

#220 fcolecumberri closed 1 year ago
2
Error on A100，device kernel image is invalid

#219 lileilai opened 1 year ago
0
Multi-GPU, allocate output tensor on input tensor's device

#218 Lunderberg closed 1 year ago
0
Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.

#217 chigkim opened 1 year ago
2
CUDA kernel sync problem

#216 chu-tianxiang closed 1 year ago
1
wbit=16 Conversion Gives Error

#215 sawradip opened 1 year ago
2
CUDA Benchmark on 2bit, 3bit, 4bit models - Why 3bit slower than 4bit, but faster than 2biit?

#214 sawradip closed 1 year ago
1
4bits on 65B

#213 jear closed 1 year ago
1
explicitly declare wbits and group_size

#212 cauyxy closed 1 year ago
0
How can I get the gradient when using 4bits model?

#211 Joanna-0421 opened 1 year ago
0
IndexError: tensors used as indices must be long, byte or bool tensors

#210 Pathos14489 opened 1 year ago
2
CUDA error: unknown error (Error when quantize llama Model)

#209 ostix360 opened 1 year ago
1
Add --layers-dist to define layers distribution across multi-gpus

#208 Thireus closed 1 year ago
0
neox.py generates randrange() error

#207 GenTxt closed 1 year ago
13
Security Issue: This Auto-downloads 800 trojan viruses

#206 freckletonj closed 1 year ago
2
CUDA: 8bit quantized models are stupid.

#205 Ph0rk0z opened 1 year ago
4
File "<string>", line 21, in matmul_248_kernel

#204 moophlo opened 1 year ago
0
Fix NameError: name 'math' is not defined

#203 Thireus closed 1 year ago
0
sharing gpu tensors across processes +devicemap

#202 xloem closed 1 year ago
0
cuda/quant.py: respect device index

#201 xloem closed 1 year ago
0
fix bug

#200 qwopqwop200 closed 1 year ago
0
Fix: TabError

#199 USBhost closed 1 year ago
0
NameError: name 'transformers' is not defined

#198 catalpaaa closed 1 year ago
2
style(project): format with yapf

#197 tpoisonooo closed 1 year ago
5
feat(llama.py): add SNR error

#196 tpoisonooo closed 1 year ago
1
WIP: feat(llama.py): quantize input

#195 tpoisonooo closed 1 year ago
2
Fix NameError: name 'transformers' is not defined

#194 Thireus closed 1 year ago
0
llama 30b generates strange answers after quantizing to 4bit

#193 pzzmyc closed 1 year ago
1
why disable tf32 ?

#192 tpoisonooo closed 1 year ago
4
slower inference speed

#191 MatthewCYM closed 1 year ago
4

Previous Next