issues
search
pytorch-labs
/
gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
BSD 3-Clause "New" or "Revised" License
5.34k
stars
484
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
CUDA error if enabling compile_prefill for quantization model (int8)
#137
yanboliang
opened
3 months ago
7
Do not attempt to import distributed primitives on MacOS
#136
malfet
closed
3 months ago
0
Update generate.py to enable MPS support
#135
mikekgfb
closed
3 months ago
0
GGUF fp32/fp16 conversion to checkpoint
#134
mergennachin
opened
3 months ago
1
Optimized the process of loading PyTorch state dictionaries, merging …
#133
hvaria
opened
3 months ago
2
Set device to CPU if CUDA not available in some arguments
#132
cpuhrsch
closed
3 months ago
0
Update to use torch.nn.attention.sdpa_kernel
#131
yanboliang
opened
3 months ago
2
moe/download.py - ignore safetensors download
#130
michaelfeil
closed
3 months ago
5
int4/int4-gptq support in Mixtral 8x7B
#129
yanbing-j
opened
3 months ago
2
Mixtral MoE improvements: transposed w2 to have reduction dim be innermost dim
#128
yanboliang
closed
3 months ago
0
Reducing Latency in Application with Torch Compilation: Initialization and Inference Optimization
#127
daniyal214
opened
3 months ago
0
index out of range: No transformer config could be loaded
#126
SinanAkkoyun
opened
3 months ago
1
Int4 perplexity
#125
SinanAkkoyun
opened
3 months ago
0
Can't quantize to int4 and can't compile on RTX2080Ti
#124
kaizizzzzzz
closed
2 months ago
2
Add weight only quantization support for cpu device
#123
mingfeima
closed
3 months ago
2
Speculative decoding with draft model:TinyLlama-1.1B
#122
kaizizzzzzz
closed
4 months ago
0
Set torch._dynamo.config.max_loop_unroll_nodes to 7500
#121
yifuwang
closed
4 months ago
1
AMD RX 7900 XTX Wrong outputs
#120
makaveli10
closed
3 months ago
0
Updating requirements.txt and .gitignore
#119
Artyom17
opened
4 months ago
0
Clean up mixtral-moe
#118
yanboliang
closed
4 months ago
0
Minor fix for generate.py
#117
Artyom17
closed
4 months ago
0
Adding Mistral-7B support
#116
Artyom17
closed
4 months ago
1
[example] Added gemma support
#115
Chillee
opened
4 months ago
4
Question about the gennerated code of `WeightOnlyInt8Linear`
#114
feiyuvl
opened
4 months ago
5
Fixing synchronization issue when non-zero GPU is used as a device
#113
Artyom17
closed
4 months ago
4
batching/dynamic batching
#112
nivibilla
opened
4 months ago
1
Try Tensor Parallel on a server equipped with two V100 linked by NVLINK, but got a performance degradation
#111
duanzhaol
opened
4 months ago
8
Update README link
#110
yanboliang
closed
4 months ago
0
Update README to add link to Mixtral MoE folder
#109
yanboliang
closed
4 months ago
1
What happens to bias during int8 quantization?
#108
gchhablani
opened
4 months ago
3
Questions on Speculative Decoding in gpt-fast generate.py
#107
hxer7963
opened
4 months ago
2
Update README.md
#106
MDK8888
closed
4 months ago
3
Add Mixtral-8x7B in sub-folder
#105
yanboliang
closed
4 months ago
0
improving GPTQ defauls
#104
HDCharles
closed
4 months ago
0
Remove unnecessary wrapper code
#103
HDCharles
closed
4 months ago
0
[quant] Add int8 per token dynamic quant + int4 per group quant for ExecuTorch
#102
jerryzh168
opened
4 months ago
1
fixing over padding and GPTQ padding bug
#101
jerryzh168
closed
4 months ago
0
fixing over padding and GPTQ padding bug
#100
HDCharles
closed
4 months ago
0
Bandwidth achieved for INT8 is much smaller than FP16
#99
yafehlis
opened
4 months ago
3
fixing circular import
#97
HDCharles
closed
4 months ago
1
pass@1 score extremely low using GPT-fast API
#94
yafehlis
closed
4 months ago
3
Fixes for eval and GPTQ after move to gpt-fast
#93
HDCharles
closed
5 months ago
0
I try to speed up with llava,but this it slower then eager mode,why?
#92
bleedingfight
opened
5 months ago
1
Updating eval for lm_eval 0.4 and 0.3
#91
HDCharles
closed
4 months ago
0
Can GPT-Fast support larger batch sizes
#90
yetingqiaqia
closed
4 months ago
3
`eval.py` uses older version of lm_eval
#89
nairbv
closed
4 months ago
1
Size mismatch error occurs when loading models quantized by GPTQ
#88
sdc17
opened
5 months ago
1
RuntimeError: CUDA error: named symbol not found
#87
ce1190222
opened
5 months ago
1
How is llama-7b trained, what is the verification accuracy?
#86
frankxyy
closed
4 months ago
2
generate.py: remove duplicate if condition
#85
guoyejun
closed
3 months ago
0
Previous
Next