pytorch-labs gpt-fast issues

pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

BSD 3-Clause "New" or "Revised" License

5.34k stars 484 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

CUDA error if enabling compile_prefill for quantization model (int8)

#137 yanboliang opened 3 months ago
7
Do not attempt to import distributed primitives on MacOS

#136 malfet closed 3 months ago
0
Update generate.py to enable MPS support

#135 mikekgfb closed 3 months ago
0
GGUF fp32/fp16 conversion to checkpoint

#134 mergennachin opened 3 months ago
1
Optimized the process of loading PyTorch state dictionaries, merging …

#133 hvaria opened 3 months ago
2
Set device to CPU if CUDA not available in some arguments

#132 cpuhrsch closed 3 months ago
0
Update to use torch.nn.attention.sdpa_kernel

#131 yanboliang opened 3 months ago
2
moe/download.py - ignore safetensors download

#130 michaelfeil closed 3 months ago
5
int4/int4-gptq support in Mixtral 8x7B

#129 yanbing-j opened 3 months ago
2
Mixtral MoE improvements: transposed w2 to have reduction dim be innermost dim

#128 yanboliang closed 3 months ago
0
Reducing Latency in Application with Torch Compilation: Initialization and Inference Optimization

#127 daniyal214 opened 3 months ago
0
index out of range: No transformer config could be loaded

#126 SinanAkkoyun opened 3 months ago
1
Int4 perplexity

#125 SinanAkkoyun opened 3 months ago
0
Can't quantize to int4 and can't compile on RTX2080Ti

#124 kaizizzzzzz closed 2 months ago
2
Add weight only quantization support for cpu device

#123 mingfeima closed 3 months ago
2
Speculative decoding with draft model:TinyLlama-1.1B

#122 kaizizzzzzz closed 4 months ago
0
Set torch._dynamo.config.max_loop_unroll_nodes to 7500

#121 yifuwang closed 4 months ago
1
AMD RX 7900 XTX Wrong outputs

#120 makaveli10 closed 3 months ago
0
Updating requirements.txt and .gitignore

#119 Artyom17 opened 4 months ago
0
Clean up mixtral-moe

#118 yanboliang closed 4 months ago
0
Minor fix for generate.py

#117 Artyom17 closed 4 months ago
0
Adding Mistral-7B support

#116 Artyom17 closed 4 months ago
1
[example] Added gemma support

#115 Chillee opened 4 months ago
4
Question about the gennerated code of `WeightOnlyInt8Linear`

#114 feiyuvl opened 4 months ago
5
Fixing synchronization issue when non-zero GPU is used as a device

#113 Artyom17 closed 4 months ago
4
batching/dynamic batching

#112 nivibilla opened 4 months ago
1
Try Tensor Parallel on a server equipped with two V100 linked by NVLINK, but got a performance degradation

#111 duanzhaol opened 4 months ago
8
Update README link

#110 yanboliang closed 4 months ago
0
Update README to add link to Mixtral MoE folder

#109 yanboliang closed 4 months ago
1
What happens to bias during int8 quantization?

#108 gchhablani opened 4 months ago
3
Questions on Speculative Decoding in gpt-fast generate.py

#107 hxer7963 opened 4 months ago
2
Update README.md

#106 MDK8888 closed 4 months ago
3
Add Mixtral-8x7B in sub-folder

#105 yanboliang closed 4 months ago
0
improving GPTQ defauls

#104 HDCharles closed 4 months ago
0
Remove unnecessary wrapper code

#103 HDCharles closed 4 months ago
0
[quant] Add int8 per token dynamic quant + int4 per group quant for ExecuTorch

#102 jerryzh168 opened 4 months ago
1
fixing over padding and GPTQ padding bug

#101 jerryzh168 closed 4 months ago
0
fixing over padding and GPTQ padding bug

#100 HDCharles closed 4 months ago
0
Bandwidth achieved for INT8 is much smaller than FP16

#99 yafehlis opened 4 months ago
3
fixing circular import

#97 HDCharles closed 4 months ago
1
pass@1 score extremely low using GPT-fast API

#94 yafehlis closed 4 months ago
3
Fixes for eval and GPTQ after move to gpt-fast

#93 HDCharles closed 5 months ago
0
I try to speed up with llava,but this it slower then eager mode,why?

#92 bleedingfight opened 5 months ago
1
Updating eval for lm_eval 0.4 and 0.3

#91 HDCharles closed 4 months ago
0
Can GPT-Fast support larger batch sizes

#90 yetingqiaqia closed 4 months ago
3
`eval.py` uses older version of lm_eval

#89 nairbv closed 4 months ago
1
Size mismatch error occurs when loading models quantized by GPTQ

#88 sdc17 opened 5 months ago
1
RuntimeError: CUDA error: named symbol not found

#87 ce1190222 opened 5 months ago
1
How is llama-7b trained, what is the verification accuracy?

#86 frankxyy closed 4 months ago
2
generate.py: remove duplicate if condition

#85 guoyejun closed 3 months ago
0

Previous Next