turboderp exllama issues

turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

MIT License

2.66k stars 214 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Tried to build setup exllama but encountering ninja related errors, can someone please help me?

#258 BwandoWando opened 10 months ago
3
stop-string support?

#257 krypterro opened 10 months ago
2
Request: Some improvements to web app.py

#256 Midaychi opened 10 months ago
0
refine json dicts for ws example

#255 Kerushii closed 10 months ago
0
Bad output for 2080 ti

#254 filipemesquita opened 10 months ago
1
GPU Usage Keeps High Even Without Inference Load

#253 leonxia1018 opened 10 months ago
7
Is it possible to do batch generate?

#252 fahadh4ilyas opened 10 months ago
7
Are we *really* using nvlink?

#251 Ph0rk0z closed 10 months ago
1
recover unsaved modification

#250 Kerushii closed 10 months ago
3
ws example for streaming with context reuse and token testing

#249 Kerushii closed 10 months ago
0
Custom multiple stop token (for roleplay / conversation)

#248 wangerzi closed 10 months ago
6
Possible to load model with low system ram?

#245 gros87 opened 10 months ago
4
RuntimeError: temp_state buffer is too small

#244 daniel-kukiela closed 10 months ago
1
Modify generator.py > generate_simple to accept encode_special_characters?

#243 zmarty opened 11 months ago
1
Header too large error when running benchmark

#242 DKormann closed 10 months ago
2
Is there a way to make compress_pos_emb dynamic?

#241 fahadh4ilyas closed 10 months ago
2
Can max_seq_len be set via CLI or GUI in webui?

#240 int19h closed 10 months ago
2
KV caching?

#238 bryanhpchiang opened 11 months ago
2
Continuous Batching support

#237 FireMasterK opened 11 months ago
0
Generation uses config.max_seq_len instead of default 2048

#236 flotos closed 11 months ago
1
Question about example_flask.py

#235 ZeroYuJie opened 11 months ago
1
Question about sampling and kernel fusion

#234 sleepwalker2017 closed 11 months ago
6
RuntimeError with airoboros-l2-13b

#233 corv89 closed 11 months ago
2
Strange output / doesn't make any sense

#232 lordwebbie closed 11 months ago
5
Slower tokens/s than expecting

#231 teknium1 opened 11 months ago
14
Support for NF4?

#230 hoagy-davis-digges opened 11 months ago
1
[Bug]: Sampling fails when temperature is 0

#226 kogolobo opened 11 months ago
4
Hangs after reboot caused by TrippleFault.

#225 SolsticeProjekt closed 11 months ago
3
Fix HIP on recent PyTorch version

#224 ardfork closed 11 months ago
0
custom stop tokens in generator.py

#223 Kerushii closed 10 months ago
1
Please handle the case your logits contain nans

#222 ParisNeo opened 11 months ago
1
Llama 2 Chat implementation

#221 SinanAkkoyun opened 11 months ago
10
Weird issue with context length

#220 zzzacwork opened 11 months ago
6
Which LLama model do you use? Could you give a download link?

#219 sleepwalker2017 closed 11 months ago
3
Speculative decoding?

#218 bryanhpchiang opened 11 months ago
17
Very bad response

#217 pourfard closed 10 months ago
9
Reply is too short

#216 hengjiUSTC closed 11 months ago
4
How to change extend context with llama2?

#215 ShahZ181 closed 11 months ago
3
Question about storing models in Container

#214 JacobGoldenArt opened 11 months ago
2
[Feature Request] OpenAI-compatible API

#212 langchain4j closed 11 months ago
11
"temp_state buffer is too small" when using LLama 13b at full context length

#211 anujnayyar1 closed 11 months ago
5
Add example of max seq length configuration

#210 vadi2 closed 11 months ago
2
compile kernel

#209 xiaoxiangshusheng closed 11 months ago
1
Unable to split across multiple AMD GPUs

#208 TNT3530 closed 11 months ago
4
Infinities during model evaluation

#207 50h100a closed 11 months ago
8
How to shard model and batched cache equally?

#206 nivibilla closed 11 months ago
4
Can't assign model to multi gpu

#205 nivibilla closed 11 months ago
1
Latency grows substantially as batch size increases, even with small batch sizes

#202 joehoover opened 11 months ago
2
fixed seed doesn't work on ooba's webui

#201 BadisG closed 11 months ago
3
.

#200 mrbianchi closed 11 months ago
0

Previous Next