issues
search
turboderp
/
exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.19k
stars
234
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Cannot load models saved with HF transformers due to shared tensors in safetensors
#408
AndrewRyanChama
closed
2 weeks ago
1
Simple QuaRot proof of concept.
#407
sgsdxzy
opened
2 months ago
4
TypeError: make_q_matrix(): incompatible function arguments qhen quantizing Cohere Command R v0.1
#405
welnaseth
closed
2 months ago
1
command-r plus config
#404
bdambrosio
closed
2 months ago
4
Merge dev branch
#403
turboderp
closed
2 months ago
0
Add support for return_logits, return_ids, return_prompt toggles in base generator
#402
aliencaocao
closed
2 months ago
3
exllamav2 very slow in compare to llama-cpp-python...? Or did I somethong wrong?
#401
rsoika
opened
2 months ago
5
Support C4AI Command-R+
#400
alexbrowngh
closed
1 month ago
22
Input the embedding tensor into LLMs?
#399
aliencaocao
opened
2 months ago
40
ROCm Flash-Attention 2
#397
nktice
opened
2 months ago
1
beam search support
#396
ovowei
closed
1 week ago
1
MemoryError python convert.py
#394
kisimoff
closed
2 months ago
2
Issues with Concurrent Request Handling using exllamav2 and Flask Streaming
#393
iammrj
closed
1 month ago
3
Jamba support
#392
theyunt
closed
2 months ago
1
Illegal instruction crash due to AVX2 compile time opts
#391
AndrewRyanChama
closed
2 months ago
4
dbrx doesn't respect gpu_split, OOMs on the first GPU no matter what
#390
tdrussell
closed
3 months ago
3
KeyError: 'measurement'
#389
Katehuuh
closed
3 months ago
2
dbrx architecture
#388
veryVANYA
closed
2 weeks ago
37
Incorporate RAG with Exllamav2
#387
insanesac
closed
3 months ago
2
MemoryError
#386
insanesac
closed
3 months ago
3
generate_simple still having issues with eos_token_id
#385
shensmobile
closed
2 weeks ago
8
Reverting/rolling back filter state
#384
xonfour
opened
3 months ago
10
RuntimeError: Insufficient VRAM for model and cache using load_autosplit_gen
#383
xu-jenny
closed
3 months ago
3
Low context bugs and errors.
#382
ThomasBaruzier
closed
3 months ago
3
The EXL2 quantization for Qwen model
#381
zchen-cpu
closed
2 months ago
7
error: identifier "__hfma2" is undefined
#380
timefliesfang
opened
3 months ago
3
Support GPT2 architecture
#379
iamwavecut
closed
1 month ago
1
BF16 conversion completes, but result does not run properly in ooba/llama-cpp as context length is shortened from 8K to 2K
#378
jim-plus
closed
2 months ago
3
Compiling the Torch C++ extension results in a `ModuleNotFoundError: No module named 'torch'` error, even though torch is installed.
#377
ThomasBaruzier
closed
3 months ago
2
Fix typo in table of qcache_eval.md
#376
john-sp
closed
3 months ago
1
MemoryError on TinyLlama and Llama-70B-chat
#375
MarcusGrattan
closed
3 months ago
2
Fix installation step (install requirements) & Add multi-GPU explanation
#374
Lyrcaxis
closed
3 months ago
0
Concurrency: Add opportunities to abort on processing
#373
bdashore3
closed
3 months ago
1
Support C4AI Command-R
#372
djmaze
closed
3 months ago
2
Really high system RAM + slow load
#371
NinjaPerson24119
closed
1 week ago
12
Size for reserved vram for cache way bigger then needed for cache_4bit?
#370
ParkhutRoman
closed
3 months ago
2
Add flash attention to requirements.txt
#367
Anthonyg5005
closed
3 months ago
3
The Google Colab seems to be broken
#366
Vuizur
closed
3 months ago
1
quanting community
#365
Kerushii
closed
3 months ago
0
expose xformers?
#364
Ph0rk0z
closed
1 week ago
6
Q4 cache CUDA API calls fails to compile on ROCm HIP
#361
kzha0
closed
3 months ago
1
Build on Ubuntu 20.04 for releases
#360
Omegastick
closed
3 months ago
2
Something is wrong about flash attention
#358
ParisNeo
closed
1 week ago
4
Llava support?
#357
kinchahoy
closed
1 week ago
5
Performance drops with longer prompts?
#356
qss-uzair
closed
1 week ago
3
Fix a couple of filter bugs
#354
seanlynch
closed
4 months ago
1
Some Yi-34b models can't produce spaces. One I just quantized does. Regression?
#353
tau0-deltav
closed
4 months ago
6
Added a mention of lollms-webui as another possible webui that can be used with exllamav2 as a backend
#352
ParisNeo
closed
4 months ago
2
Cannot load emma model with latest version
#350
techvd
closed
2 months ago
6
Feature request: Multi-gpu conversion
#349
richardburleigh
closed
3 months ago
6
Previous
Next