turboderp exllamav2 issues

turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

MIT License

3.2k stars 236 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Significant performance degradation between 0.11 and 0.12 when used from ooba

#302 aikitoria closed 2 weeks ago
17
GPU not supported on Nvidia Jetson AGX with JetPack 5.1

#301 davidtheITguy opened 5 months ago
4
[BUG] test_mmlu.py does not support MoE models.

#300 ThomasBaruzier closed 3 weeks ago
1
4 bit quantization performance?

#299 cvhoang closed 3 months ago
6
Fails to compile with ROCm

#298 niansa closed 2 weeks ago
14
How to improve the performance of gemm_half_q_half_gptq_kernel operator?

#297 lovelynight opened 5 months ago
1
Update tokenizer.py to decode list of tensors as well as single tensor

#296 ba2512005 closed 5 months ago
1
Marlin from the GPTQ guys

#295 ghost closed 5 months ago
2
`compile_model` failed with "TypeError: unsupported operand type(s) for |=: 'dict' and 'dict'"

#294 gabinguo closed 5 months ago
2
Yi-Yi 2x34b+ merges generate very slowly.

#293 Ph0rk0z opened 5 months ago
8
Reducing generating response (40 token/s to 10 token/s) using chat.py

#292 tednas opened 5 months ago
4
Clear cache to avoid OOM with iterative generation

#291 cdreetz closed 2 weeks ago
6
87225fe "Optimize kernel batch performance" breaks some chat queries

#290 bjj closed 5 months ago
8
safetensors with aio.h does not build on windows

#289 bjj closed 5 months ago
2
Improve StreamingGenerator stop conditions efficiency

#288 TMK04 closed 5 months ago
1
TypeError: unhashable type: 'slice' when converting and quantizing

#287 tm17-abcgen closed 5 months ago
2
Can I use multi gpus to load my model to inference

#286 UncleFB closed 5 months ago
1
Support for multimodal models

#285 ParisNeo opened 5 months ago
0
Grammar Support on Exllamav2

#284 tednas closed 2 weeks ago
3
InternLM2 Support

#283 brucethemoose closed 2 weeks ago
5
Test_inference for a single prompt on an array of input texts

#282 tednas closed 2 weeks ago
2
AttributeError: 'float' object has no attribute 'item' during convert.py measurement pass

#281 sophosympatheia closed 5 months ago
2
[BUG] test_mmlu.py doesn't produce any results

#280 ThomasBaruzier closed 5 months ago
2
Possible problem in zero output

#279 chu-tianxiang opened 5 months ago
2
Update setup.py UX responses and module checks with helpful messaging

#277 bgorlick opened 5 months ago
4
Add LLM-FTC-sampling

#276 catid closed 5 months ago
3
Repeat layers to create FrankenModels

#275 dnhkng opened 5 months ago
34
Draft model error when predicting at max context

#274 ivsanro1 closed 2 weeks ago
1
Measurement fails, no error message

#273 biship opened 5 months ago
4
Conversion fails for Nous-Hermes-2-SOLAR-10.7B

#272 epicfilemcnulty closed 5 months ago
2
Is this a good way to implement lora `module_to_save` for `embed_token` and `lm_head`?

#271 fahadh4ilyas opened 5 months ago
0
How to not quantize lm_head?

#269 fahadh4ilyas closed 5 months ago
2
CUDA out of memory, but that doesn't seem to be true

#268 Sebastianv650 closed 2 weeks ago
3
DeepSeek Coder produces only blank completions

#267 viktor-ferenczi closed 5 months ago
2
[Feature Request] A way to determine which stop sequence caused the stop (or if it was instead caused by the EOS token or `max_new_tokens`)

#266 josephrocca closed 2 weeks ago
1
Constrained generation. What is needed?

#265 meditans closed 2 weeks ago
1
Fix case where there are no disallowed tokens in `websocket_actions.py`

#264 josephrocca closed 5 months ago
1
Add dynatemp (the entropy one)

#263 awtrisk closed 5 months ago
7
Inquiring about Calibration Procedures and Issues for Model Using Specified Dataset

#262 MatrixC7 closed 5 months ago
11
Integration with llamaindex for RAG

#261 mirix opened 6 months ago
4
Token healing (under 40 LOC)

#260 ahmed-moubtahij closed 6 months ago
1
Loras: Remove qkv assertion

#259 bdashore3 closed 6 months ago
0
Documentation for understanding

#258 JINO-ROHIT closed 2 weeks ago
1
Can you help me split EXL2 weights for multi gpu?

#257 bob-just-bob closed 2 weeks ago
2
Freeze after import (maybe ROCm only)

#256 lufixSch closed 4 months ago
9
CFG gave me OOM in tabbyAPI.

#255 Ph0rk0z closed 2 weeks ago
1
Using HF Safetensors

#252 cdreetz closed 6 months ago
7
add openchat prompt format

#251 eramax closed 6 months ago
0
how to implement the backend of dynamic batch?

#250 tanklandry closed 2 weeks ago
1
Can load GPTQ models fine, but when running Can't infere gptq models, i get the follow tracebak

#249 userbox020 opened 6 months ago
0

Previous Next