issues
search
turboderp
/
exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.2k
stars
235
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Some Yi-34b models can't produce spaces. One I just quantized does. Regression?
#353
tau0-deltav
closed
4 months ago
6
Added a mention of lollms-webui as another possible webui that can be used with exllamav2 as a backend
#352
ParisNeo
closed
4 months ago
2
Cannot load emma model with latest version
#350
techvd
closed
2 months ago
6
Feature request: Multi-gpu conversion
#349
richardburleigh
closed
3 months ago
6
understanding config.max_input_len
#348
bdambrosio
closed
4 months ago
2
Reserved ram not freed?
#347
Celppu
closed
2 weeks ago
5
Slow inference speed on A100?
#346
jingzhaoou
closed
2 weeks ago
2
Issue when Rocm and Cuda on windows
#345
sorasoras
opened
4 months ago
2
Fix for converting to safetensor with wildcards
#344
bartowski1182
closed
3 months ago
0
Is LoRa working as the same as exllamav2 loader in text-generation-webui?
#343
LiangA
opened
4 months ago
0
TypeError during exllamav2 model quantization
#341
1PercentSync
closed
4 months ago
2
EndeavourOS + ROCm crashes computer
#340
NinjaPerson24119
closed
2 weeks ago
5
Smaug
#339
bdambrosio
closed
4 months ago
8
Prebuilt CUDA wheels no longer run on Ubuntu 20.04
#338
jriesen
closed
3 months ago
6
Memory Management and BSOD
#337
Annamae-beep
closed
3 months ago
3
AQML compression/quantization
#336
Tedy50
opened
4 months ago
0
Tuning the ethical guidelines of ExLlamaV2
#335
redshiva
closed
3 months ago
6
Is Smaug 70B supported?
#334
rjmehta1993
closed
2 weeks ago
17
Is there a google colab to quantize llms using exllamav2?
#333
JamesKnight0001
closed
2 weeks ago
2
Support Qwen1.5
#332
mymymy1303
closed
4 months ago
3
Fix tuple returns with the streaming generator
#331
bdashore3
closed
4 months ago
0
Refactor token healing initialization.
#330
bjj
opened
4 months ago
4
HIP kernel errors
#328
userbox020
opened
4 months ago
3
Evaluate with lm-evaluation-harness
#326
OfficialDelta
closed
4 months ago
2
Question on repeating a prompt
#325
dnhkng
closed
4 months ago
5
2 minor changes
#324
flying-x
closed
4 months ago
0
Any benefit to choosing something other than wikitext for code models?
#323
irthomasthomas
closed
5 months ago
2
Increase context length
#322
virentakia
closed
2 weeks ago
1
Converted 120B model "killed" message appears and exits in Layer0.
#321
sat0r1r1
closed
3 months ago
4
`Some tensors share memory` error with convert.py
#320
brucethemoose
closed
5 months ago
3
`import exllamav2` getting this error
#319
hemangjoshi37a
closed
4 months ago
11
what is `path_to_model` in the starter command in the readme ?
#318
hemangjoshi37a
closed
4 months ago
1
Kalomaze's Quadratic Sampling
#317
AAbushady
closed
5 months ago
8
ExLlamaV2Embedding can't be unloaded if it failed to load
#316
bjj
closed
2 weeks ago
3
JSON schema/format
#315
tednaseri
closed
3 months ago
3
Build from source issue in conda environment
#314
kibilogic
closed
5 months ago
2
Resolved compiler Warnings for typecasting for proper byte reading and comparison
#313
bgorlick
closed
5 months ago
0
Optimize Checkpoint File Saving and Handling in Quantize.py with Atomic Operation
#312
bgorlick
closed
5 months ago
1
[Feature Suggestion] SmoothQuant (W8A8) leads to ~50% better throughput
#311
DreamGenX
opened
5 months ago
0
Add graceful exit sig handling and status box for quantization to assist estimating completion time and overall accuracy
#310
bgorlick
closed
5 months ago
0
why there isn't a more popular training approach based on exl2.
#309
laoda513
opened
5 months ago
5
Optionally return logits from streaming generator
#308
silphendio
closed
5 months ago
0
Typo in conversion/qparams.py
#306
4PiR2
closed
5 months ago
1
Quantization Error (2)
#305
152334H
closed
5 months ago
3
Trying to quantize a model for the first time. First stage completes, second stage finishes all layers then gives runtime error CUBLAS_STATUS_EXECUTION_FAILED
#304
longtimegone
closed
5 months ago
5
Quantization is no longer possible in version 0.0.12
#303
Nyandaro
closed
5 months ago
2
Significant performance degradation between 0.11 and 0.12 when used from ooba
#302
aikitoria
closed
2 weeks ago
17
GPU not supported on Nvidia Jetson AGX with JetPack 5.1
#301
davidtheITguy
opened
5 months ago
4
[BUG] test_mmlu.py does not support MoE models.
#300
ThomasBaruzier
closed
2 weeks ago
1
4 bit quantization performance?
#299
cvhoang
closed
3 months ago
6
Previous
Next