issues
search
turboderp
/
exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.23k
stars
238
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Any examples on long inputs on rope scaled model?
#248
sreeprasannar
closed
3 weeks ago
1
Quantizing goliath120b @ 3bpw : calibration perplexity (quant): 2745.1239
#246
alexconstant9108
closed
6 months ago
8
Mistral fails/garbage at context > 8192, transformers works fine
#245
matatonic
closed
6 months ago
2
Feature request: EAGLE
#244
vt404v2
closed
3 months ago
1
Batched flash attention
#243
fahadh4ilyas
opened
6 months ago
2
example_batchprocessing.py
#242
Kerushii
closed
3 months ago
0
feat: frequency and presence penalty
#241
AlpinDale
closed
6 months ago
0
Batched generation with flash attention
#240
fahadh4ilyas
closed
6 months ago
2
feat: add top-A sampling
#239
AlpinDale
closed
6 months ago
0
Return token probabilities in generator.stream()
#238
ivsanro1
closed
5 months ago
3
add flash attention feature to different seqlen batch
#237
fahadh4ilyas
closed
6 months ago
1
What is "an improved exllamav2 quant method"?
#236
yamosin
closed
6 months ago
2
Attempted to quant a custom MoE model, Plap-8x13b, and get an error.
#235
NiriProject
closed
3 weeks ago
1
TypeError: ExLlamaV2Tokenizer.encode() got an unexpected keyword argument 'return_offsets'
#234
Rajmehta123
closed
6 months ago
1
[ROCM] [GFX1030] no output
#233
IMbackK
closed
3 weeks ago
2
The answers from model with batching return different answer even with the same input
#232
fahadh4ilyas
closed
3 weeks ago
15
Fix encoder in MMLU benchmark
#231
dvdtoth
closed
6 months ago
0
How to use gpu_split in inference.py example
#230
irthomasthomas
opened
6 months ago
0
Installation instructions "Method 1" dysfunctional
#229
takosalad
opened
6 months ago
1
Merge experimental
#228
turboderp
closed
6 months ago
0
Merge changes from master
#227
turboderp
closed
6 months ago
0
cache.clone() is not creating a copy of the cache
#226
hidoba
closed
6 months ago
1
CPU offloading
#225
bibidentuhanoi
closed
6 months ago
2
Stop conditions and exclude prompt for Base generator
#224
SinanAkkoyun
opened
6 months ago
0
Mixtral
#223
nivibilla
closed
6 months ago
13
some GPTQ models can not be loaded anymore
#222
sammyf
closed
6 months ago
3
Support DragonFox style "BaNnbAnN"
#221
Kerushii
closed
6 months ago
1
Is seed actually used?
#220
richardburleigh
closed
7 months ago
1
DeepSeek: ValueError: bytes must be in range(0, 256)
#219
SinanAkkoyun
closed
6 months ago
2
Fixed multi file and wildcard args
#218
SinanAkkoyun
closed
6 months ago
13
add QuiP quant support
#217
waters222
opened
7 months ago
3
About YiTokenizer errors
#216
redwoodzero0
closed
6 months ago
2
ExLlamaV2Cache_8bit does not work with multiple_caches.py example
#215
lopuhin
closed
6 months ago
2
Error quantizing models on recent commit
#213
brucethemoose
closed
7 months ago
4
How to clear cache / reset the cache so that model doesnt remember the response earlier?
#212
Rajmehta123
closed
6 months ago
2
(Oobabooga) Can't load GPTQ models anymore with ExLlama-V2 0.0.10
#211
Daviljoe193
closed
7 months ago
2
Allow padding data instead of concatenating when generating calibration dataset
#209
ivsanro1
opened
7 months ago
0
Adding return_lowest_perplexity
#206
ziadloo
opened
7 months ago
0
Added draft model rope scale to chat example
#204
SinanAkkoyun
closed
7 months ago
0
Difference between gemm_half_q_half_gptq_kernel and gemm_half_q_half_kernel
#202
frankxyy
closed
7 months ago
0
Quantization error "Warning: Applied additional damping" and "Hessian"
#201
yamosin
closed
7 months ago
2
Generating a batch of different propmpt sizes, the shorter prompts tend to suffer
#200
ziadloo
closed
3 weeks ago
7
flash attention does nothing
#199
Tedy50
closed
2 months ago
5
Support for no_repeat_ngram_size
#198
anujnayyar1
opened
7 months ago
0
Support GPT2 tokenizer for CausalLM 72b
#196
CyberTimon
closed
7 months ago
6
Fix Unicode errors when loading files
#195
bdashore3
closed
7 months ago
0
Load models' tokenizer.json file with utf-8 encoding.
#194
VldmrB
closed
7 months ago
1
support for awq
#193
frankxyy
closed
1 month ago
0
Added DeepSeek Coder Instruct to chat example
#192
SinanAkkoyun
closed
7 months ago
0
Fixed code block syntax highlighting for long code
#191
SinanAkkoyun
closed
7 months ago
1
Previous
Next