issues
search
pytorch-labs
/
gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
BSD 3-Clause "New" or "Revised" License
5.36k
stars
485
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
May I ask which version of PyTorch does this project correspond to?
#36
ye1024
closed
7 months ago
3
What would it take to support other models like deepseek coder?
#35
briandw
closed
6 months ago
3
question about position embedding
#34
jchuai
opened
7 months ago
1
[example] changed int8 quantization to do fp8 weight-only quantization
#33
Chillee
opened
7 months ago
2
Error(s) in loading state_dict for Transformer
#32
Nikita-Sherstnev
closed
7 months ago
2
Help explain "Actually better for Inductor to codegen attention here"
#31
huntzhan
opened
7 months ago
9
KeyError: 'model.layers.{}.self_attn.W_pack.weight'
#30
wccccp
opened
7 months ago
4
Adding black and isort
#29
chethanuk
opened
7 months ago
3
NameError: name 'InputRecorder' is not defined
#28
MrD005
opened
7 months ago
2
Normal Inference seems to output more tokens per second.
#27
tamil-acog
opened
7 months ago
1
What's the input context length for the benchmark results?
#26
YangZhou0417
closed
6 months ago
2
Does it support the reasoning acceleration of Qwen-14B?
#25
dashi6174
opened
7 months ago
4
Extended support for existing precision variable
#24
ankitvgupta
opened
7 months ago
0
Speed up model loading by 25%
#23
daulet
closed
1 month ago
2
Problems with interactive mode
#22
XuandongZhao
opened
7 months ago
0
Speculative decoding slows model down, possibly from "skipping cudagraphs due to ['mutated inputs']"?
#21
jamestwhedbee
opened
7 months ago
7
Compatible with AutoGPTQ?
#20
yhyu13
opened
7 months ago
5
About benchmark results
#19
1787648106
opened
7 months ago
1
is Flash attention2 supprted?
#18
rajveer43
closed
7 months ago
2
Update README.md
#17
eltociear
closed
7 months ago
3
Sort imports.
#16
hmosousa
closed
7 months ago
0
Inductor Op Lowering
#15
jeromeku
opened
7 months ago
3
index out of bounds for --compile_prefill with int4 and int8
#14
lopuhin
opened
7 months ago
3
duplicate import
#13
IvanVnucec
closed
7 months ago
2
GPTQ quantization not working
#12
lopuhin
opened
7 months ago
16
AttributeError: 'LlamaForCausalLM' object has no attribute 'setup_caches'
#11
ohashi3399
closed
7 months ago
3
performance loss for int4 compare with AWQ?
#10
lucasjinreal
opened
7 months ago
0
Will these optimization integrate into hf's code?
#9
lucasjinreal
opened
7 months ago
7
Remove the eos parameter of the encode_tokens method
#8
gklab
opened
7 months ago
2
I'm trying to use the meta-llama-7b model on my NVIDIA GeForce GTX 1060 6GB
#7
MeDott29
closed
6 months ago
2
AMD quantize
#6
rraulison
opened
7 months ago
4
Use weights_only for load
#5
kit1980
closed
7 months ago
0
Update download.py
#4
thorhojhus
closed
7 months ago
3
Link to Blog
#3
billxbf
closed
7 months ago
1
Downloads the whole hf repo
#2
das-projects
opened
7 months ago
2
Apple Silicon support?
#1
caseybasichis
opened
7 months ago
3
Previous