issues
search
turboderp
/
exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.2k
stars
236
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Significant performance degradation between 0.11 and 0.12 when used from ooba
#302
aikitoria
closed
2 weeks ago
17
GPU not supported on Nvidia Jetson AGX with JetPack 5.1
#301
davidtheITguy
opened
5 months ago
4
[BUG] test_mmlu.py does not support MoE models.
#300
ThomasBaruzier
closed
3 weeks ago
1
4 bit quantization performance?
#299
cvhoang
closed
3 months ago
6
Fails to compile with ROCm
#298
niansa
closed
2 weeks ago
14
How to improve the performance of gemm_half_q_half_gptq_kernel operator?
#297
lovelynight
opened
5 months ago
1
Update tokenizer.py to decode list of tensors as well as single tensor
#296
ba2512005
closed
5 months ago
1
Marlin from the GPTQ guys
#295
ghost
closed
5 months ago
2
`compile_model` failed with "TypeError: unsupported operand type(s) for |=: 'dict' and 'dict'"
#294
gabinguo
closed
5 months ago
2
Yi-Yi 2x34b+ merges generate very slowly.
#293
Ph0rk0z
opened
5 months ago
8
Reducing generating response (40 token/s to 10 token/s) using chat.py
#292
tednas
opened
5 months ago
4
Clear cache to avoid OOM with iterative generation
#291
cdreetz
closed
2 weeks ago
6
87225fe "Optimize kernel batch performance" breaks some chat queries
#290
bjj
closed
5 months ago
8
safetensors with aio.h does not build on windows
#289
bjj
closed
5 months ago
2
Improve StreamingGenerator stop conditions efficiency
#288
TMK04
closed
5 months ago
1
TypeError: unhashable type: 'slice' when converting and quantizing
#287
tm17-abcgen
closed
5 months ago
2
Can I use multi gpus to load my model to inference
#286
UncleFB
closed
5 months ago
1
Support for multimodal models
#285
ParisNeo
opened
5 months ago
0
Grammar Support on Exllamav2
#284
tednas
closed
2 weeks ago
3
InternLM2 Support
#283
brucethemoose
closed
2 weeks ago
5
Test_inference for a single prompt on an array of input texts
#282
tednas
closed
2 weeks ago
2
AttributeError: 'float' object has no attribute 'item' during convert.py measurement pass
#281
sophosympatheia
closed
5 months ago
2
[BUG] test_mmlu.py doesn't produce any results
#280
ThomasBaruzier
closed
5 months ago
2
Possible problem in zero output
#279
chu-tianxiang
opened
5 months ago
2
Update setup.py UX responses and module checks with helpful messaging
#277
bgorlick
opened
5 months ago
4
Add LLM-FTC-sampling
#276
catid
closed
5 months ago
3
Repeat layers to create FrankenModels
#275
dnhkng
opened
5 months ago
34
Draft model error when predicting at max context
#274
ivsanro1
closed
2 weeks ago
1
Measurement fails, no error message
#273
biship
opened
5 months ago
4
Conversion fails for Nous-Hermes-2-SOLAR-10.7B
#272
epicfilemcnulty
closed
5 months ago
2
Is this a good way to implement lora `module_to_save` for `embed_token` and `lm_head`?
#271
fahadh4ilyas
opened
5 months ago
0
How to not quantize lm_head?
#269
fahadh4ilyas
closed
5 months ago
2
CUDA out of memory, but that doesn't seem to be true
#268
Sebastianv650
closed
2 weeks ago
3
DeepSeek Coder produces only blank completions
#267
viktor-ferenczi
closed
5 months ago
2
[Feature Request] A way to determine which stop sequence caused the stop (or if it was instead caused by the EOS token or `max_new_tokens`)
#266
josephrocca
closed
2 weeks ago
1
Constrained generation. What is needed?
#265
meditans
closed
2 weeks ago
1
Fix case where there are no disallowed tokens in `websocket_actions.py`
#264
josephrocca
closed
5 months ago
1
Add dynatemp (the entropy one)
#263
awtrisk
closed
5 months ago
7
Inquiring about Calibration Procedures and Issues for Model Using Specified Dataset
#262
MatrixC7
closed
5 months ago
11
Integration with llamaindex for RAG
#261
mirix
opened
6 months ago
4
Token healing (under 40 LOC)
#260
ahmed-moubtahij
closed
6 months ago
1
Loras: Remove qkv assertion
#259
bdashore3
closed
6 months ago
0
Documentation for understanding
#258
JINO-ROHIT
closed
2 weeks ago
1
Can you help me split EXL2 weights for multi gpu?
#257
bob-just-bob
closed
2 weeks ago
2
Freeze after import (maybe ROCm only)
#256
lufixSch
closed
4 months ago
9
CFG gave me OOM in tabbyAPI.
#255
Ph0rk0z
closed
2 weeks ago
1
Using HF Safetensors
#252
cdreetz
closed
6 months ago
7
add openchat prompt format
#251
eramax
closed
6 months ago
0
how to implement the backend of dynamic batch?
#250
tanklandry
closed
2 weeks ago
1
Can load GPTQ models fine, but when running Can't infere gptq models, i get the follow tracebak
#249
userbox020
opened
6 months ago
0
Previous
Next