issues
search
turboderp
/
exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.18k
stars
233
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Convert.py quantization abruptly failing without errors
#466
engadine1997
closed
1 month ago
4
Command-R plus OOM 0.0.18 -> 0.0.19
#465
kennylin0309
opened
1 month ago
9
Using ExLlamaV2 with Phi-3-Vision
#464
CyberTimon
opened
1 month ago
0
Fixed minor typo in convert.md doc
#463
iamrohitanshu
closed
1 month ago
0
add layer GPU offloading for hidden/target states
#462
kallewoof
closed
3 weeks ago
0
Integration with Hugging Face transformers library
#461
SunMarc
opened
1 month ago
5
Error when trying to quantize Viking-7B
#459
minipasila
closed
1 week ago
7
max_attention_size should be max_input_len**2 ?
#458
laoda513
closed
1 month ago
1
undefined symbol: _ZN3c104cuda9SetDeviceEi
#457
icivi
opened
1 month ago
1
optimization: reduce GPU transfers
#456
kallewoof
closed
1 month ago
3
optimization: put rfn_sum on cuda and do .item() call out of for loop
#455
kallewoof
closed
1 month ago
3
what does `make_sequential` do when using gptq inference?
#454
sleepwalker2017
opened
1 month ago
2
Error trying to quantize cognitivecomputations/dolphin-2.9.1-qwen-110b
#453
bablat
closed
3 weeks ago
7
integrate xformers
#452
laoda513
closed
1 month ago
2
ExLlamaV2StreamingGenerator error
#451
nktice
closed
1 month ago
4
Scaling inference throughput when increasing the batch size
#450
lopuhin
opened
1 month ago
1
Load generation_config.json from compatible models
#449
nickpotafiy
closed
1 month ago
1
Installing exllama falied
#448
freQuensy23-coder
closed
2 weeks ago
3
Addition of DRY: A modern repetition penalty that reliably prevents looping
#447
awtrisk
opened
1 month ago
7
Issue with dolphin mixtral8x22b
#445
luijait
opened
1 month ago
1
Integration with txtai for RAG
#444
edwardsmith999
opened
1 month ago
2
DeepSeek V2 support
#443
SinanAkkoyun
opened
1 month ago
4
Control Vectors
#442
acidbubbles
opened
1 month ago
0
config.py
#439
Huzaif2309
closed
2 weeks ago
1
[question] how to make generation determinsitic?
#438
yshui
closed
2 weeks ago
1
custom bos for cat
#436
Kerushii
closed
1 month ago
1
Quantized LLama3 inference not working
#435
BenjaminGantenbein
closed
1 month ago
2
Update from 0.0.19 to 0.0.20 with Python 3.11, torch 2.2.1 and CUDA 12.1: DLL load failed while importing exllamav2_ext: The specified procedure could not be found.
#434
acidbubbles
closed
2 weeks ago
11
Qwen-110B quantize failed, RuntimeError: CUDA error: an illegal memory access was encountered
#433
buliaoyin
closed
2 months ago
3
Layer Skip looks interesting
#431
SinanAkkoyun
closed
2 months ago
0
FP16 + ROCm Possibly Subpar Performance
#428
Beinsezii
closed
2 weeks ago
3
ImportError: /home/ec2-user/.cache/torch_extensions/py310_cu121/exllamav2_ext/exllamav2_ext.so: undefined symbol: _ZN3c104cuda14ExchangeDeviceEa
#427
rjmehta1993
closed
1 week ago
3
Added option to prevent tokens from being penalized for repetition
#426
Lyrcaxis
closed
1 month ago
2
Phi-3 Support
#425
candre23
closed
2 months ago
8
Error when trying to quantize Llama3 70B instruct: module 'exllamav2_ext' has no attribute 'sim_anneal'
#424
RodriMora
closed
2 months ago
2
Piece ID is out of range.
#422
Ph0rk0z
closed
2 months ago
4
Action to build wheels on ROCm 6.0
#421
Orion-zhen
closed
2 months ago
9
Error loading turboderp/Llama-3-70B-Instruct-exl2
#420
pfan94
closed
2 months ago
2
CUDA OOM when quantizing Llama-3-8B-Instruct
#419
dog3-l0ver
closed
2 months ago
2
Fix ROCm compile
#417
turboderp
closed
2 months ago
0
Merge dev branch
#416
turboderp
closed
2 months ago
0
Llama3 doesn't define `pad_token_id`, it defaults to 0, which tokenizer.json has as the token ID for '!'
#415
VldmrB
closed
2 months ago
9
llama3 qaunt error
#414
bdambrosio
closed
2 months ago
2
Torch error when loading GPTQ model
#413
Fuckingnameless
closed
2 months ago
8
Triton based flash attention 2 that supports volta and up.
#411
Ph0rk0z
closed
2 weeks ago
11
mixtral-8x22b / command-r generation wanders off.
#410
bdambrosio
closed
2 months ago
2
An Issue with Finetuning of GPTQ-LoRA with ExllamaV2 MatMul Kernel
#409
achew010
opened
2 months ago
1
Cannot load models saved with HF transformers due to shared tensors in safetensors
#408
AndrewRyanChama
closed
2 weeks ago
1
Simple QuaRot proof of concept.
#407
sgsdxzy
opened
2 months ago
4
TypeError: make_q_matrix(): incompatible function arguments qhen quantizing Cohere Command R v0.1
#405
welnaseth
closed
2 months ago
1
Previous
Next