issues
search
pytorch-labs
/
gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
BSD 3-Clause "New" or "Revised" License
5.33k
stars
484
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
tokenizer.model
#186
hasakikiki
opened
3 days ago
0
It doesn't accelerate very well at L4
#185
songh11
opened
6 days ago
0
getting different acceptance prob when using `torch.compile` after making a small change.
#184
kalradivyanshu
opened
1 week ago
0
Question about the ENABLE_INTRA_NODE_COMM for speculative decoding
#183
jianc99
opened
1 week ago
9
GGUF support?
#182
yukiarimo
opened
2 weeks ago
0
Fix rope base issue with llama 3
#181
VikParuchuri
closed
2 weeks ago
3
[WIP] Use DTensor-based tensor parallel
#180
kwen2501
opened
2 weeks ago
0
`meta-llama/Meta-Llama-3-8B-Instruct` generates gibberish for long prompts
#179
griff4692
closed
2 weeks ago
5
Update installation instructions in README.md
#178
Jokeren
closed
5 days ago
1
Hard-coded Llama-3 model name pattern matching breaks scripts/convert_hf_checkpoint.py
#177
ephremw
opened
1 month ago
0
Update Grok-1 and DBRX support in README
#176
yanboliang
closed
1 month ago
0
Remove nn.Embedding layer from model size
#175
yanboliang
closed
1 month ago
0
[example] Add support for DBRX
#174
yanboliang
opened
1 month ago
0
Throughput Benchmark Scripts
#173
HanGuo97
closed
1 month ago
2
Missing Keys in state_dict
#172
bjohn22
opened
1 month ago
1
[example] Added (hacky) Grok1 support
#171
Chillee
opened
1 month ago
2
Making TokenizerInterface more usable for the user's code.
#170
Artyom17
opened
2 months ago
0
Unified Llama 3 (8b,70b) + Safetensors support
#169
nivibilla
closed
5 days ago
20
Unified llama 3 support.
#168
nivibilla
closed
2 months ago
1
Tensor Parallel Inside notebook
#167
nivibilla
opened
2 months ago
0
Llama3 8b perf numbers on A100
#166
yanboliang
closed
2 weeks ago
0
mmap issue in bf16 of gpt-fast
#165
yanbing-j
opened
2 months ago
0
Remove used empty variable
#164
yncxcw
opened
2 months ago
2
Add download script for tinyllamas
#163
yiliu30
opened
2 months ago
2
Naming: n_local_heads -> n_kv_heads
#162
ad8e
opened
2 months ago
0
Optimize Int8 Woq for CPU
#161
yanbing-j
opened
2 months ago
2
Input token length question
#160
kaizizzzzzz
closed
2 months ago
2
Fixing quantize in int4 mode
#159
Artyom17
opened
2 months ago
4
llama3 8B support, tiktoken tokenizer
#158
Artyom17
closed
2 months ago
21
fix input_pos shape in comment
#157
YassineYousfi
opened
2 months ago
2
shape fix for gptq
#156
HDCharles
closed
2 months ago
0
testing HQQ [not for land]
#155
HDCharles
opened
2 months ago
0
INT4 quantization not working on MI210
#154
yafehlis
opened
2 months ago
1
Fix compile_prefill to prevent CUDA error
#153
PasserBy4
opened
2 months ago
2
Fix int4 quantization
#152
malfet
closed
2 months ago
0
Enable TinyLLAMAs quantization
#151
malfet
closed
2 months ago
0
Tiny Llamas Not Found
#150
guihao-liang
closed
2 months ago
2
On the memory usage of `ConditionalFeedForward`
#149
carmocca
closed
2 months ago
4
fixing GPTQ
#148
HDCharles
opened
3 months ago
0
fixing GPTQ
#147
HDCharles
opened
3 months ago
0
Infer cache/RoPE weight dtype from output weights
#146
malfet
closed
3 months ago
0
Add CPU support in mixtral-moe for int8 woq
#145
yanbing-j
closed
2 months ago
3
int8 Woq raise Codegen Error with `--compile_prefill`
#144
yanbing-j
opened
3 months ago
4
Add gpt-accelera project to the community section of README
#143
Edward-Sun
closed
3 months ago
3
int4 gptq shape fix
#142
HDCharles
closed
3 months ago
0
Fixing block size for Mistral-7B.
#141
Artyom17
opened
3 months ago
1
Question about large sequence length attention kernels
#140
loubbrad
opened
3 months ago
1
Update README.md to list tinyllamas
#139
mikekgfb
closed
3 months ago
0
Add support for TinyLLAMAs
#138
malfet
closed
3 months ago
0
CUDA error if enabling compile_prefill for quantization model (int8)
#137
yanboliang
opened
3 months ago
7
Next