turboderp exllamav2 issues

turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

MIT License

3.19k stars 234 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Cannot load models saved with HF transformers due to shared tensors in safetensors

#408 AndrewRyanChama closed 2 weeks ago
1
Simple QuaRot proof of concept.

#407 sgsdxzy opened 2 months ago
4
TypeError: make_q_matrix(): incompatible function arguments qhen quantizing Cohere Command R v0.1

#405 welnaseth closed 2 months ago
1
command-r plus config

#404 bdambrosio closed 2 months ago
4
Merge dev branch

#403 turboderp closed 2 months ago
0
Add support for return_logits, return_ids, return_prompt toggles in base generator

#402 aliencaocao closed 2 months ago
3
exllamav2 very slow in compare to llama-cpp-python...? Or did I somethong wrong?

#401 rsoika opened 2 months ago
5
Support C4AI Command-R+

#400 alexbrowngh closed 1 month ago
22
Input the embedding tensor into LLMs?

#399 aliencaocao opened 2 months ago
40
ROCm Flash-Attention 2

#397 nktice opened 2 months ago
1
beam search support

#396 ovowei closed 1 week ago
1
MemoryError python convert.py

#394 kisimoff closed 2 months ago
2
Issues with Concurrent Request Handling using exllamav2 and Flask Streaming

#393 iammrj closed 1 month ago
3
Jamba support

#392 theyunt closed 2 months ago
1
Illegal instruction crash due to AVX2 compile time opts

#391 AndrewRyanChama closed 2 months ago
4
dbrx doesn't respect gpu_split, OOMs on the first GPU no matter what

#390 tdrussell closed 3 months ago
3
KeyError: 'measurement'

#389 Katehuuh closed 3 months ago
2
dbrx architecture

#388 veryVANYA closed 2 weeks ago
37
Incorporate RAG with Exllamav2

#387 insanesac closed 3 months ago
2
MemoryError

#386 insanesac closed 3 months ago
3
generate_simple still having issues with eos_token_id

#385 shensmobile closed 2 weeks ago
8
Reverting/rolling back filter state

#384 xonfour opened 3 months ago
10
RuntimeError: Insufficient VRAM for model and cache using load_autosplit_gen

#383 xu-jenny closed 3 months ago
3
Low context bugs and errors.

#382 ThomasBaruzier closed 3 months ago
3
The EXL2 quantization for Qwen model

#381 zchen-cpu closed 2 months ago
7
error: identifier "__hfma2" is undefined

#380 timefliesfang opened 3 months ago
3
Support GPT2 architecture

#379 iamwavecut closed 1 month ago
1
BF16 conversion completes, but result does not run properly in ooba/llama-cpp as context length is shortened from 8K to 2K

#378 jim-plus closed 2 months ago
3
Compiling the Torch C++ extension results in a `ModuleNotFoundError: No module named 'torch'` error, even though torch is installed.

#377 ThomasBaruzier closed 3 months ago
2
Fix typo in table of qcache_eval.md

#376 john-sp closed 3 months ago
1
MemoryError on TinyLlama and Llama-70B-chat

#375 MarcusGrattan closed 3 months ago
2
Fix installation step (install requirements) & Add multi-GPU explanation

#374 Lyrcaxis closed 3 months ago
0
Concurrency: Add opportunities to abort on processing

#373 bdashore3 closed 3 months ago
1
Support C4AI Command-R

#372 djmaze closed 3 months ago
2
Really high system RAM + slow load

#371 NinjaPerson24119 closed 1 week ago
12
Size for reserved vram for cache way bigger then needed for cache_4bit?

#370 ParkhutRoman closed 3 months ago
2
Add flash attention to requirements.txt

#367 Anthonyg5005 closed 3 months ago
3
The Google Colab seems to be broken

#366 Vuizur closed 3 months ago
1
quanting community

#365 Kerushii closed 3 months ago
0
expose xformers?

#364 Ph0rk0z closed 1 week ago
6
Q4 cache CUDA API calls fails to compile on ROCm HIP

#361 kzha0 closed 3 months ago
1
Build on Ubuntu 20.04 for releases

#360 Omegastick closed 3 months ago
2
Something is wrong about flash attention

#358 ParisNeo closed 1 week ago
4
Llava support?

#357 kinchahoy closed 1 week ago
5
Performance drops with longer prompts?

#356 qss-uzair closed 1 week ago
3
Fix a couple of filter bugs

#354 seanlynch closed 4 months ago
1
Some Yi-34b models can't produce spaces. One I just quantized does. Regression?

#353 tau0-deltav closed 4 months ago
6
Added a mention of lollms-webui as another possible webui that can be used with exllamav2 as a backend

#352 ParisNeo closed 4 months ago
2
Cannot load emma model with latest version

#350 techvd closed 2 months ago
6
Feature request: Multi-gpu conversion

#349 richardburleigh closed 3 months ago
6

Previous Next