issues
search
turboderp
/
exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.18k
stars
233
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Lazy loading of 2 models give CUDA out of memory
#521
waterangel91
opened
22 hours ago
2
Flask based paged attention with streaming and generator queue to dynamically add and remove jobs.
#520
rjmehta1993
opened
1 day ago
2
Question on Async generator
#519
waterangel91
closed
3 days ago
6
Problem running dynamic Gen.py in examples
#518
rjmehta1993
closed
1 day ago
2
There was a problem changing characters in the SillyTavern
#517
1Q18LAqakl
opened
6 days ago
21
Dumb quantize/selective recompile/recapitation?
#516
tau0-deltav
opened
1 week ago
1
Chameleon support
#515
end-me-please
opened
1 week ago
2
inference_json Example not working
#513
rjmehta1993
closed
5 days ago
3
Support for architecture DeepseekV2ForCausalLM
#512
RodriMora
opened
1 week ago
10
test_inference.py: --low_mem is broken unless --max_output_len is also set
#511
IMbackK
closed
1 week ago
1
ROCM: Issues with wave64 device
#510
IMbackK
closed
2 days ago
24
There is a problem when quantizing Qwen2
#509
1Q18LAqakl
opened
1 week ago
9
extremely high CPU usage
#508
sfttfs
closed
1 week ago
10
Quick question on Dynamic Generation
#507
dnhkng
closed
1 week ago
2
Increase GPU utilization?
#506
sunflower-leaf
closed
1 week ago
4
importing exllamav2.generator stops here
#505
lovebeatz
opened
1 week ago
21
Covert.py measurement "Killed"
#504
GHBigD
opened
1 week ago
6
Floating point exception when context length > chunk_size
#503
cikkle
opened
1 week ago
7
RuntimeError: FlashAttention only supports Ampere GPUs or newer - despite asking it to disable flash_attention
#502
quarterturn
opened
2 weeks ago
1
return last state in dynamic generator
#501
nickpotafiy
opened
2 weeks ago
0
Q-Cache - Token Generation Speed
#499
Vhallo
opened
2 weeks ago
4
Problems with quantization and qwen2 inference
#498
nikitabalakin
closed
2 weeks ago
4
Update Actions
#497
bdashore3
closed
2 weeks ago
0
Running humaneval against llama-3-8b-instruct exl2 quant results in a silent OOM when samples per task > 7
#496
LlamaEnjoyer
opened
2 weeks ago
5
"Loading exllamav2_ext extension (JIT)... Building C++/CUDA extension" hangs forever
#495
AgeOfAlgorithms
closed
2 weeks ago
3
EXL2 format spec?
#494
polarathene
closed
1 week ago
3
Qwen 2 inference problem
#493
Sadeghi85
closed
1 week ago
15
Make comments in the readme look better
#491
CyberTimon
closed
2 weeks ago
0
After updating exllamav2 to 0.1.0, the text generated by exui will not be pushed verbatim
#490
xldistance
opened
2 weeks ago
1
Qauntization in glm4-9b failed
#489
Orion-zhen
opened
2 weeks ago
3
Added steps to benchmark in README
#488
RodriMora
closed
2 weeks ago
1
Added steps to benchmark using the included mmlu script
#487
RodriMora
closed
3 weeks ago
0
LM Enforcer cause hanged generation and what is the Sampler setting
#486
waterangel91
closed
3 weeks ago
17
v0.1.3 lm format enforcer broken
#485
waterangel91
closed
3 weeks ago
2
Non stop generation after update to v0.1.1.0 and latest flash attention
#484
waterangel91
closed
3 weeks ago
6
v0.1.1 multi-gpu issue (fine in v0.0.21)
#483
surenchl
closed
3 weeks ago
1
Quick, Non-Data-Driven Quantization
#482
alexbrowngh
closed
1 week ago
3
module 'exllamav2_ext' has no attribute 'count_match'
#481
abpani
closed
1 week ago
1
Can I not use flash attention? Because the model needs to be deployed to nvidia T4
#480
vikotse
closed
1 week ago
4
Support MiniCPM architecture
#479
meigami0
closed
1 week ago
5
[feature request] LLAMA.CPP
#476
0wwafa
closed
3 weeks ago
3
Problem with blinker...
#475
0wwafa
opened
4 weeks ago
3
Phi-3 medium generation issue
#474
rjmehta1993
opened
4 weeks ago
3
quantization fails while writing shards
#472
theyunt
closed
4 weeks ago
2
What does the implemention of ’segmenting input‘ in the exllamav2 called?
#470
laoda513
closed
1 month ago
3
Dynamic gen is slower?!
#469
Ph0rk0z
closed
1 week ago
4
Cannot load Llama-3 8B Instruct, incompatible function arguments
#468
nickpotafiy
closed
4 weeks ago
3
ROCm version 0.1.0, getting errors
#467
hvico
closed
1 month ago
2
Convert.py quantization abruptly failing without errors
#466
engadine1997
closed
4 weeks ago
4
Command-R plus OOM 0.0.18 -> 0.0.19
#465
kennylin0309
opened
1 month ago
9
Next