turboderp exllamav2 issues

turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

MIT License

3.2k stars 235 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Some Yi-34b models can't produce spaces. One I just quantized does. Regression?

#353 tau0-deltav closed 4 months ago
6
Added a mention of lollms-webui as another possible webui that can be used with exllamav2 as a backend

#352 ParisNeo closed 4 months ago
2
Cannot load emma model with latest version

#350 techvd closed 2 months ago
6
Feature request: Multi-gpu conversion

#349 richardburleigh closed 3 months ago
6
understanding config.max_input_len

#348 bdambrosio closed 4 months ago
2
Reserved ram not freed?

#347 Celppu closed 2 weeks ago
5
Slow inference speed on A100?

#346 jingzhaoou closed 2 weeks ago
2
Issue when Rocm and Cuda on windows

#345 sorasoras opened 4 months ago
2
Fix for converting to safetensor with wildcards

#344 bartowski1182 closed 3 months ago
0
Is LoRa working as the same as exllamav2 loader in text-generation-webui?

#343 LiangA opened 4 months ago
0
TypeError during exllamav2 model quantization

#341 1PercentSync closed 4 months ago
2
EndeavourOS + ROCm crashes computer

#340 NinjaPerson24119 closed 2 weeks ago
5
Smaug

#339 bdambrosio closed 4 months ago
8
Prebuilt CUDA wheels no longer run on Ubuntu 20.04

#338 jriesen closed 3 months ago
6
Memory Management and BSOD

#337 Annamae-beep closed 3 months ago
3
AQML compression/quantization

#336 Tedy50 opened 4 months ago
0
Tuning the ethical guidelines of ExLlamaV2

#335 redshiva closed 3 months ago
6
Is Smaug 70B supported?

#334 rjmehta1993 closed 2 weeks ago
17
Is there a google colab to quantize llms using exllamav2?

#333 JamesKnight0001 closed 2 weeks ago
2
Support Qwen1.5

#332 mymymy1303 closed 4 months ago
3
Fix tuple returns with the streaming generator

#331 bdashore3 closed 4 months ago
0
Refactor token healing initialization.

#330 bjj opened 4 months ago
4
HIP kernel errors

#328 userbox020 opened 4 months ago
3
Evaluate with lm-evaluation-harness

#326 OfficialDelta closed 4 months ago
2
Question on repeating a prompt

#325 dnhkng closed 4 months ago
5
2 minor changes

#324 flying-x closed 4 months ago
0
Any benefit to choosing something other than wikitext for code models?

#323 irthomasthomas closed 5 months ago
2
Increase context length

#322 virentakia closed 2 weeks ago
1
Converted 120B model "killed" message appears and exits in Layer0.

#321 sat0r1r1 closed 3 months ago
4
`Some tensors share memory` error with convert.py

#320 brucethemoose closed 5 months ago
3
`import exllamav2` getting this error

#319 hemangjoshi37a closed 4 months ago
11
what is `path_to_model` in the starter command in the readme ?

#318 hemangjoshi37a closed 4 months ago
1
Kalomaze's Quadratic Sampling

#317 AAbushady closed 5 months ago
8
ExLlamaV2Embedding can't be unloaded if it failed to load

#316 bjj closed 2 weeks ago
3
JSON schema/format

#315 tednaseri closed 3 months ago
3
Build from source issue in conda environment

#314 kibilogic closed 5 months ago
2
Resolved compiler Warnings for typecasting for proper byte reading and comparison

#313 bgorlick closed 5 months ago
0
Optimize Checkpoint File Saving and Handling in Quantize.py with Atomic Operation

#312 bgorlick closed 5 months ago
1
[Feature Suggestion] SmoothQuant (W8A8) leads to ~50% better throughput

#311 DreamGenX opened 5 months ago
0
Add graceful exit sig handling and status box for quantization to assist estimating completion time and overall accuracy

#310 bgorlick closed 5 months ago
0
why there isn't a more popular training approach based on exl2.

#309 laoda513 opened 5 months ago
5
Optionally return logits from streaming generator

#308 silphendio closed 5 months ago
0
Typo in conversion/qparams.py

#306 4PiR2 closed 5 months ago
1
Quantization Error (2)

#305 152334H closed 5 months ago
3
Trying to quantize a model for the first time. First stage completes, second stage finishes all layers then gives runtime error CUBLAS_STATUS_EXECUTION_FAILED

#304 longtimegone closed 5 months ago
5
Quantization is no longer possible in version 0.0.12

#303 Nyandaro closed 5 months ago
2
Significant performance degradation between 0.11 and 0.12 when used from ooba

#302 aikitoria closed 2 weeks ago
17
GPU not supported on Nvidia Jetson AGX with JetPack 5.1

#301 davidtheITguy opened 5 months ago
4
[BUG] test_mmlu.py does not support MoE models.

#300 ThomasBaruzier closed 2 weeks ago
1
4 bit quantization performance?

#299 cvhoang closed 3 months ago
6

Previous Next