issues
search
turboderp
/
exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.56k
stars
273
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Load models' tokenizer.json file with utf-8 encoding.
#194
VldmrB
closed
10 months ago
1
support for awq
#193
frankxyy
closed
4 months ago
0
Added DeepSeek Coder Instruct to chat example
#192
SinanAkkoyun
closed
10 months ago
0
Fixed code block syntax highlighting for long code
#191
SinanAkkoyun
closed
10 months ago
1
Chat code syntax highlighter has problems when out of view
#190
SinanAkkoyun
closed
10 months ago
4
Add Huggingface tokenzier support
#189
DOGEwbx
closed
10 months ago
13
Support for Huggingface Fast Tokenizers
#188
bibekyess
closed
10 months ago
13
Support LookaheadDecoding?
#187
yhyu13
closed
10 months ago
2
Batched generations are very similar
#186
anujnayyar1
closed
10 months ago
2
Upcasting all the calculations.
#185
Ph0rk0z
opened
10 months ago
3
Any plans for Mac/Metal support?
#184
aikitoria
closed
9 months ago
3
How do I deal with custom EOS?
#183
teknium1
closed
10 months ago
1
fix infer websocket_actions.py
#182
Kerushii
closed
10 months ago
0
Different groupsizes for different bitwidths?
#181
dalistarh
closed
10 months ago
4
[Question] `h_gemm` Implementation
#180
jeromeku
closed
9 months ago
2
convert.py ending with "Killed" at lm_head layer when converting zephyr-7b
#179
Christopheraburns
closed
4 months ago
1
QuIP#, New SOTA(?) 2-bit Quantization Method
#176
brucethemoose
opened
10 months ago
6
Is there a colab or something which shows all the code necessary to setup the project?
#175
slooi
closed
10 months ago
1
EXL2 Proposal: Llamacpp Allocation Heuristics and Specialisation Degree Feature
#174
aljungberg
closed
4 months ago
12
Windows 10, Oobabooga, inability to load some models
#173
homeworkace
closed
10 months ago
3
Improve model load performance
#171
abstractdescutcheon
closed
8 months ago
4
Lora doesn't impact the model outputs
#169
matankley
closed
9 months ago
1
AssertionError: Insufficient space in device allocation
#168
Double-bear
closed
9 months ago
2
Update base.py to remove useless statement
#167
MilesQLi
closed
9 months ago
0
OOM with gpu_split_auto, must specify split manually
#166
cikkle
closed
4 months ago
8
Performance with speculative decoding is slightly worse than without at full context
#165
cikkle
closed
9 months ago
3
dynamic enable special tokens
#164
wangyu1997
closed
8 months ago
1
How can you clear the cache of the exllamav2?
#163
Rajmehta123
closed
5 months ago
2
CPU offloading?
#162
oobabooga
closed
10 months ago
1
self.rms_norm_eps = read_config["rms_norm_eps"] KeyError: 'rms_norm_eps' (Qwen model not supported)
#160
tutu329
closed
9 months ago
8
No module named 'exllamav2_ext' when loading a model
#159
ParisNeo
closed
5 months ago
11
exllamav2 Installation Error : [Errno 20] Not a directory: 'hipconfig'.
#158
watchstep
closed
10 months ago
6
Tweak to multiple cache example
#157
dvianisoho
closed
11 months ago
2
Got error when converting Llama2 70b
#156
GolemXlV
closed
11 months ago
2
Yi models generates some languages in an incorrect encoding
#155
samuelazran
closed
11 months ago
2
Simplify HIP compatibility
#154
ardfork
closed
11 months ago
8
Very poor performance when VRAM is nearly full (inconsistent)
#153
QM60
closed
6 months ago
13
Error When Converting Safetensor to exl2
#152
Noobville1345
closed
9 months ago
3
70B Quant Potential Issue
#151
azureblackprime
closed
11 months ago
2
Support for 01-ai/Yi series of models?
#150
anujnayyar1
closed
11 months ago
2
Suggestion: allow different context lengths for draft model and main model in speculative sampling
#149
Antollo
closed
11 months ago
3
CU12+ appears to be unsupported?
#148
SebJansen
closed
9 months ago
15
response id and infer modification
#146
Kerushii
closed
10 months ago
1
Implement Efficient Streaming Language Models with Attention Sinks
#145
BigArty
closed
6 months ago
1
rocm5.2.0 torch run test_inference.py error undefined symbol: hipblasHgemm
#144
Rane2021
closed
9 months ago
2
[ROCM] Gibberish on wave64 devices
#143
IMbackK
closed
11 months ago
1
The turboderp 70B quants on HuggingFace are broken
#142
11415142513152119
closed
11 months ago
2
Suggestion: Implement MinP
#140
Chanka0
closed
11 months ago
4
how many time should it spend to convert llama2 70b to 2bit? on single V10032GB
#139
zlh1992
closed
9 months ago
2
Exl2 model fails to run -- RuntimeError: Insufficient size of temp_dq buffer
#138
truenorth8
closed
11 months ago
2
Previous
Next