turboderp exllamav2 issues

turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

MIT License

3.18k stars 233 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Convert.py quantization abruptly failing without errors

#466 engadine1997 closed 1 month ago
4
Command-R plus OOM 0.0.18 -> 0.0.19

#465 kennylin0309 opened 1 month ago
9
Using ExLlamaV2 with Phi-3-Vision

#464 CyberTimon opened 1 month ago
0
Fixed minor typo in convert.md doc

#463 iamrohitanshu closed 1 month ago
0
add layer GPU offloading for hidden/target states

#462 kallewoof closed 3 weeks ago
0
Integration with Hugging Face transformers library

#461 SunMarc opened 1 month ago
5
Error when trying to quantize Viking-7B

#459 minipasila closed 1 week ago
7
max_attention_size should be max_input_len**2 ?

#458 laoda513 closed 1 month ago
1
undefined symbol: _ZN3c104cuda9SetDeviceEi

#457 icivi opened 1 month ago
1
optimization: reduce GPU transfers

#456 kallewoof closed 1 month ago
3
optimization: put rfn_sum on cuda and do .item() call out of for loop

#455 kallewoof closed 1 month ago
3
what does `make_sequential` do when using gptq inference?

#454 sleepwalker2017 opened 1 month ago
2
Error trying to quantize cognitivecomputations/dolphin-2.9.1-qwen-110b

#453 bablat closed 3 weeks ago
7
integrate xformers

#452 laoda513 closed 1 month ago
2
ExLlamaV2StreamingGenerator error

#451 nktice closed 1 month ago
4
Scaling inference throughput when increasing the batch size

#450 lopuhin opened 1 month ago
1
Load generation_config.json from compatible models

#449 nickpotafiy closed 1 month ago
1
Installing exllama falied

#448 freQuensy23-coder closed 2 weeks ago
3
Addition of DRY: A modern repetition penalty that reliably prevents looping

#447 awtrisk opened 1 month ago
7
Issue with dolphin mixtral8x22b

#445 luijait opened 1 month ago
1
Integration with txtai for RAG

#444 edwardsmith999 opened 1 month ago
2
DeepSeek V2 support

#443 SinanAkkoyun opened 1 month ago
4
Control Vectors

#442 acidbubbles opened 1 month ago
0
config.py

#439 Huzaif2309 closed 2 weeks ago
1
[question] how to make generation determinsitic?

#438 yshui closed 2 weeks ago
1
custom bos for cat

#436 Kerushii closed 1 month ago
1
Quantized LLama3 inference not working

#435 BenjaminGantenbein closed 1 month ago
2
Update from 0.0.19 to 0.0.20 with Python 3.11, torch 2.2.1 and CUDA 12.1: DLL load failed while importing exllamav2_ext: The specified procedure could not be found.

#434 acidbubbles closed 2 weeks ago
11
Qwen-110B quantize failed, RuntimeError: CUDA error: an illegal memory access was encountered

#433 buliaoyin closed 2 months ago
3
Layer Skip looks interesting

#431 SinanAkkoyun closed 2 months ago
0
FP16 + ROCm Possibly Subpar Performance

#428 Beinsezii closed 2 weeks ago
3
ImportError: /home/ec2-user/.cache/torch_extensions/py310_cu121/exllamav2_ext/exllamav2_ext.so: undefined symbol: _ZN3c104cuda14ExchangeDeviceEa

#427 rjmehta1993 closed 1 week ago
3
Added option to prevent tokens from being penalized for repetition

#426 Lyrcaxis closed 1 month ago
2
Phi-3 Support

#425 candre23 closed 2 months ago
8
Error when trying to quantize Llama3 70B instruct: module 'exllamav2_ext' has no attribute 'sim_anneal'

#424 RodriMora closed 2 months ago
2
Piece ID is out of range.

#422 Ph0rk0z closed 2 months ago
4
Action to build wheels on ROCm 6.0

#421 Orion-zhen closed 2 months ago
9
Error loading turboderp/Llama-3-70B-Instruct-exl2

#420 pfan94 closed 2 months ago
2
CUDA OOM when quantizing Llama-3-8B-Instruct

#419 dog3-l0ver closed 2 months ago
2
Fix ROCm compile

#417 turboderp closed 2 months ago
0
Merge dev branch

#416 turboderp closed 2 months ago
0
Llama3 doesn't define `pad_token_id`, it defaults to 0, which tokenizer.json has as the token ID for '!'

#415 VldmrB closed 2 months ago
9
llama3 qaunt error

#414 bdambrosio closed 2 months ago
2
Torch error when loading GPTQ model

#413 Fuckingnameless closed 2 months ago
8
Triton based flash attention 2 that supports volta and up.

#411 Ph0rk0z closed 2 weeks ago
11
mixtral-8x22b / command-r generation wanders off.

#410 bdambrosio closed 2 months ago
2
An Issue with Finetuning of GPTQ-LoRA with ExllamaV2 MatMul Kernel

#409 achew010 opened 2 months ago
1
Cannot load models saved with HF transformers due to shared tensors in safetensors

#408 AndrewRyanChama closed 2 weeks ago
1
Simple QuaRot proof of concept.

#407 sgsdxzy opened 2 months ago
4
TypeError: make_q_matrix(): incompatible function arguments qhen quantizing Cohere Command R v0.1

#405 welnaseth closed 2 months ago
1

Previous Next