pytorch-labs gpt-fast issues

pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

BSD 3-Clause "New" or "Revised" License

5.36k stars 485 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

May I ask which version of PyTorch does this project correspond to?

#36 ye1024 closed 7 months ago
3
What would it take to support other models like deepseek coder?

#35 briandw closed 6 months ago
3
question about position embedding

#34 jchuai opened 7 months ago
1
[example] changed int8 quantization to do fp8 weight-only quantization

#33 Chillee opened 7 months ago
2
Error(s) in loading state_dict for Transformer

#32 Nikita-Sherstnev closed 7 months ago
2
Help explain "Actually better for Inductor to codegen attention here"

#31 huntzhan opened 7 months ago
9
KeyError: 'model.layers.{}.self_attn.W_pack.weight'

#30 wccccp opened 7 months ago
4
Adding black and isort

#29 chethanuk opened 7 months ago
3
NameError: name 'InputRecorder' is not defined

#28 MrD005 opened 7 months ago
2
Normal Inference seems to output more tokens per second.

#27 tamil-acog opened 7 months ago
1
What's the input context length for the benchmark results?

#26 YangZhou0417 closed 6 months ago
2
Does it support the reasoning acceleration of Qwen-14B?

#25 dashi6174 opened 7 months ago
4
Extended support for existing precision variable

#24 ankitvgupta opened 7 months ago
0
Speed up model loading by 25%

#23 daulet closed 1 month ago
2
Problems with interactive mode

#22 XuandongZhao opened 7 months ago
0
Speculative decoding slows model down, possibly from "skipping cudagraphs due to ['mutated inputs']"?

#21 jamestwhedbee opened 7 months ago
7
Compatible with AutoGPTQ?

#20 yhyu13 opened 7 months ago
5
About benchmark results

#19 1787648106 opened 7 months ago
1
is Flash attention2 supprted?

#18 rajveer43 closed 7 months ago
2
Update README.md

#17 eltociear closed 7 months ago
3
Sort imports.

#16 hmosousa closed 7 months ago
0
Inductor Op Lowering

#15 jeromeku opened 7 months ago
3
index out of bounds for --compile_prefill with int4 and int8

#14 lopuhin opened 7 months ago
3
duplicate import

#13 IvanVnucec closed 7 months ago
2
GPTQ quantization not working

#12 lopuhin opened 7 months ago
16
AttributeError: 'LlamaForCausalLM' object has no attribute 'setup_caches'

#11 ohashi3399 closed 7 months ago
3
performance loss for int4 compare with AWQ?

#10 lucasjinreal opened 7 months ago
0
Will these optimization integrate into hf's code?

#9 lucasjinreal opened 7 months ago
7
Remove the eos parameter of the encode_tokens method

#8 gklab opened 7 months ago
2
I'm trying to use the meta-llama-7b model on my NVIDIA GeForce GTX 1060 6GB

#7 MeDott29 closed 6 months ago
2
AMD quantize

#6 rraulison opened 7 months ago
4
Use weights_only for load

#5 kit1980 closed 7 months ago
0
Update download.py

#4 thorhojhus closed 7 months ago
3
Link to Blog

#3 billxbf closed 7 months ago
1
Downloads the whole hf repo

#2 das-projects opened 7 months ago
2
Apple Silicon support?

#1 caseybasichis opened 7 months ago
3