issues
search
turboderp
/
exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.69k
stars
283
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
An issue with gemma2-27b-it related to measurement
#681
antonovkz
closed
12 hours ago
5
[BUG] RuntimeError: index 1000000000 is out of bounds
#680
xonfour
opened
2 days ago
3
[BUG] Very slow Generation with Paged Attention
#679
rjmehta1993
closed
1 day ago
6
[REQUEST] Passing cache to and from generate() function for use in loop
#678
cmunna0052
closed
2 days ago
2
[BUG] Out of memory from a 2.4bpw 70B parameter model
#677
cmunna0052
closed
3 days ago
3
[BUG] Async with Paged Attention Reduces accuracy
#676
rjmehta1993
closed
2 days ago
8
[REQUEST] Can we have 1.0/1.5 bpw internally?
#675
Originalimoc
opened
5 days ago
1
[BUG] [Qwen] Draft model produce garbage output
#674
Nepherpitou
opened
1 week ago
4
[REQUEST] Convert.py: Option to skip measurement when setting 8.0/8.0
#673
Originalimoc
opened
1 week ago
0
[REQUEST] Support for a Qwen based vision model
#672
TyraVex
opened
1 week ago
2
[QUESTION] Does exllamav2 support no-dequant inference?
#670
AaronZLT
opened
2 weeks ago
1
[REQUEST] Synthetic Data generation features
#669
AstrisCantCode
opened
2 weeks ago
3
[PAPER] New quant method with SOTA quality and speed: QTIP
#668
TyraVex
opened
3 weeks ago
0
improve installation experience
#666
SecretiveShell
closed
2 weeks ago
1
[BUG] How can we increase or reduce the cache size
#665
royallavanya140
closed
1 week ago
1
[REQUEST] Alternative way to the Pytorch environment variables on Windows to set Pytorch memory management parameters
#664
Nexesenex
opened
3 weeks ago
5
[BUG] Out of memory on dual 3090 setup
#663
joshuakoh1
closed
3 weeks ago
2
[BUG] AMD - Out of memory errors despite having plenty of VRAM
#662
RSAStudioGames
opened
3 weeks ago
0
[REQUEST] Modify strings probability, rather than outright banning with banned_strings
#661
atisharma
closed
1 month ago
4
[REQUEST] Faster 6/8-bit EXL2 quantization
#660
grimulkan
opened
1 month ago
0
Torch 2.5
#659
bdashore3
closed
2 weeks ago
0
[REQUEST] Llama 3.2 Vision Support (or already exists?)
#658
grimulkan
opened
1 month ago
13
Implementation of logit threshold sampler and confidence breaker
#657
anchortense
opened
1 month ago
0
[BUG] Appending-Runtime-LoRA-weights
#656
royallavanya140
opened
1 month ago
2
[BUG] Convert script fails to run on `master` branch as of v0.2.3
#655
iamwavecut
opened
1 month ago
5
feat: try to create `out_dir` if doesn't exist
#654
iamwavecut
closed
1 month ago
1
[REQUEST] create the output directory during the quantization process
#653
Nexesenex
closed
1 month ago
2
[REQUEST]Is it possible to load a model as NF4 and convert it to Exl2?
#652
charleswg
closed
1 month ago
2
[BUG] Installation is failing for AMD MI60 (gfx906) with ROCm 6.1 and 6.2
#651
Said-Akbar
closed
1 month ago
1
AntiSlop / banned strings
#650
sam-paech
closed
1 month ago
1
enable module type checking
#649
SecretiveShell
closed
3 weeks ago
0
[BUG] AttributeError: module 'exllamav2_ext' has no attribute 'safetensors_free_pinned_buffer'
#648
Katehuuh
closed
1 month ago
1
[BUG] `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True` crashes on model loading since 0.2.3
#647
ThomasBaruzier
closed
1 month ago
7
[REQUEST] runtime flag to disable XTC sampler
#646
avidwriter
closed
1 month ago
4
[BUG] Qwen 2.5 32B quantization produces artifact at any level
#644
Nepherpitou
closed
1 month ago
2
Add YaRN scaling for Qwen 2.5
#642
Downtown-Case
closed
1 month ago
3
[REQUEST] Implement Transformer's YaRN Scaling for Long Context in Supported Models (e.g. Qwen 2.5)
#641
Downtown-Case
closed
1 month ago
4
[REQUEST] "Antislop" sampler
#640
Downtown-Case
closed
1 month ago
2
[BUG] RAM UTILISATION IS INCREASING RAPIDLY
#639
UTSAV-44
opened
1 month ago
1
Question about dequantization
#638
HaoWeiWang
closed
1 month ago
1
Add more args to humaneval
#637
LlamaEnjoyer
closed
1 month ago
0
Added draft token count as parameter to chat.py
#635
SinanAkkoyun
closed
1 month ago
1
Add `ExLlamaV2Sampler.Settings.logits_processor`
#634
lapp0
opened
2 months ago
4
[BUG] exllamav2-0.2.2+cu118.torch2.4.0-cp310-cp310-win_amd64.whl Version seems missing under releases.
#633
Nrgte
closed
1 month ago
1
[BUG] chat-instruct Llama 3.1 end word "assistant "
#632
Katehuuh
closed
2 months ago
4
[REQUEST] Is it possible and a lot of trouble to support flux?
#631
Ph0rk0z
opened
2 months ago
4
[BUG] Random slowdowns in tensor parallel.
#630
Ph0rk0z
opened
2 months ago
2
[REQUEST] Support Yarn for Qwen 2.5 >32K
#629
Downtown-Case
closed
1 month ago
1
[BUG] Qwen 2.5 34B returns garbage at certain quantization levels, but not others
#628
Downtown-Case
closed
2 months ago
6
[BUG] Failed to quantize Qwen2.5-Math-72B-Instruct: Measurement/inference error (3): hidden_states
#627
Orion-zhen
opened
2 months ago
7
Next