pytorch-labs gpt-fast issues

pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

BSD 3-Clause "New" or "Revised" License

5.33k stars 484 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

tokenizer.model

#186 hasakikiki opened 3 days ago
0
It doesn't accelerate very well at L4

#185 songh11 opened 6 days ago
0
getting different acceptance prob when using `torch.compile` after making a small change.

#184 kalradivyanshu opened 1 week ago
0
Question about the ENABLE_INTRA_NODE_COMM for speculative decoding

#183 jianc99 opened 1 week ago
9
GGUF support?

#182 yukiarimo opened 2 weeks ago
0
Fix rope base issue with llama 3

#181 VikParuchuri closed 2 weeks ago
3
[WIP] Use DTensor-based tensor parallel

#180 kwen2501 opened 2 weeks ago
0
`meta-llama/Meta-Llama-3-8B-Instruct` generates gibberish for long prompts

#179 griff4692 closed 2 weeks ago
5
Update installation instructions in README.md

#178 Jokeren closed 5 days ago
1
Hard-coded Llama-3 model name pattern matching breaks scripts/convert_hf_checkpoint.py

#177 ephremw opened 1 month ago
0
Update Grok-1 and DBRX support in README

#176 yanboliang closed 1 month ago
0
Remove nn.Embedding layer from model size

#175 yanboliang closed 1 month ago
0
[example] Add support for DBRX

#174 yanboliang opened 1 month ago
0
Throughput Benchmark Scripts

#173 HanGuo97 closed 1 month ago
2
Missing Keys in state_dict

#172 bjohn22 opened 1 month ago
1
[example] Added (hacky) Grok1 support

#171 Chillee opened 1 month ago
2
Making TokenizerInterface more usable for the user's code.

#170 Artyom17 opened 2 months ago
0
Unified Llama 3 (8b,70b) + Safetensors support

#169 nivibilla closed 5 days ago
20
Unified llama 3 support.

#168 nivibilla closed 2 months ago
1
Tensor Parallel Inside notebook

#167 nivibilla opened 2 months ago
0
Llama3 8b perf numbers on A100

#166 yanboliang closed 2 weeks ago
0
mmap issue in bf16 of gpt-fast

#165 yanbing-j opened 2 months ago
0
Remove used empty variable

#164 yncxcw opened 2 months ago
2
Add download script for tinyllamas

#163 yiliu30 opened 2 months ago
2
Naming: n_local_heads -> n_kv_heads

#162 ad8e opened 2 months ago
0
Optimize Int8 Woq for CPU

#161 yanbing-j opened 2 months ago
2
Input token length question

#160 kaizizzzzzz closed 2 months ago
2
Fixing quantize in int4 mode

#159 Artyom17 opened 2 months ago
4
llama3 8B support, tiktoken tokenizer

#158 Artyom17 closed 2 months ago
21
fix input_pos shape in comment

#157 YassineYousfi opened 2 months ago
2
shape fix for gptq

#156 HDCharles closed 2 months ago
0
testing HQQ [not for land]

#155 HDCharles opened 2 months ago
0
INT4 quantization not working on MI210

#154 yafehlis opened 2 months ago
1
Fix compile_prefill to prevent CUDA error

#153 PasserBy4 opened 2 months ago
2
Fix int4 quantization

#152 malfet closed 2 months ago
0
Enable TinyLLAMAs quantization

#151 malfet closed 2 months ago
0
Tiny Llamas Not Found

#150 guihao-liang closed 2 months ago
2
On the memory usage of `ConditionalFeedForward`

#149 carmocca closed 2 months ago
4
fixing GPTQ

#148 HDCharles opened 3 months ago
0
fixing GPTQ

#147 HDCharles opened 3 months ago
0
Infer cache/RoPE weight dtype from output weights

#146 malfet closed 3 months ago
0
Add CPU support in mixtral-moe for int8 woq

#145 yanbing-j closed 2 months ago
3
int8 Woq raise Codegen Error with `--compile_prefill`

#144 yanbing-j opened 3 months ago
4
Add gpt-accelera project to the community section of README

#143 Edward-Sun closed 3 months ago
3
int4 gptq shape fix

#142 HDCharles closed 3 months ago
0
Fixing block size for Mistral-7B.

#141 Artyom17 opened 3 months ago
1
Question about large sequence length attention kernels

#140 loubbrad opened 3 months ago
1
Update README.md to list tinyllamas

#139 mikekgfb closed 3 months ago
0
Add support for TinyLLAMAs

#138 malfet closed 3 months ago
0
CUDA error if enabling compile_prefill for quantization model (int8)

#137 yanboliang opened 3 months ago
7