issues
search
turboderp
/
exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.56k
stars
273
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Using HF Safetensors
#252
cdreetz
closed
9 months ago
7
add openchat prompt format
#251
eramax
closed
9 months ago
0
how to implement the backend of dynamic batch?
#250
tanklandry
closed
3 months ago
1
Can load GPTQ models fine, but when running Can't infere gptq models, i get the follow tracebak
#249
userbox020
closed
3 months ago
1
Any examples on long inputs on rope scaled model?
#248
sreeprasannar
closed
3 months ago
1
Quantizing goliath120b @ 3bpw : calibration perplexity (quant): 2745.1239
#246
alexconstant9108
closed
9 months ago
8
Mistral fails/garbage at context > 8192, transformers works fine
#245
matatonic
closed
9 months ago
2
Feature request: EAGLE
#244
vt404v2
closed
6 months ago
1
Batched flash attention
#243
fahadh4ilyas
closed
2 months ago
3
example_batchprocessing.py
#242
Kerushii
closed
7 months ago
0
feat: frequency and presence penalty
#241
AlpinDale
closed
9 months ago
0
Batched generation with flash attention
#240
fahadh4ilyas
closed
9 months ago
2
feat: add top-A sampling
#239
AlpinDale
closed
9 months ago
0
Return token probabilities in generator.stream()
#238
ivsanro1
closed
8 months ago
3
add flash attention feature to different seqlen batch
#237
fahadh4ilyas
closed
9 months ago
1
What is "an improved exllamav2 quant method"?
#236
yamosin
closed
9 months ago
2
Attempted to quant a custom MoE model, Plap-8x13b, and get an error.
#235
NiriProject
closed
3 months ago
1
TypeError: ExLlamaV2Tokenizer.encode() got an unexpected keyword argument 'return_offsets'
#234
Rajmehta123
closed
9 months ago
1
[ROCM] [GFX1030] no output
#233
IMbackK
closed
3 months ago
2
The answers from model with batching return different answer even with the same input
#232
fahadh4ilyas
closed
3 months ago
15
Fix encoder in MMLU benchmark
#231
dvdtoth
closed
10 months ago
0
How to use gpu_split in inference.py example
#230
irthomasthomas
closed
3 months ago
1
Installation instructions "Method 1" dysfunctional
#229
takosalad
closed
2 weeks ago
2
Merge experimental
#228
turboderp
closed
10 months ago
0
Merge changes from master
#227
turboderp
closed
10 months ago
0
cache.clone() is not creating a copy of the cache
#226
hidoba
closed
10 months ago
1
CPU offloading
#225
bibidentuhanoi
closed
10 months ago
2
Stop conditions and exclude prompt for Base generator
#224
SinanAkkoyun
closed
2 months ago
2
Mixtral
#223
nivibilla
closed
10 months ago
13
some GPTQ models can not be loaded anymore
#222
sammyf
closed
10 months ago
3
Support DragonFox style "BaNnbAnN"
#221
Kerushii
closed
10 months ago
1
Is seed actually used?
#220
richardburleigh
closed
10 months ago
1
DeepSeek: ValueError: bytes must be in range(0, 256)
#219
SinanAkkoyun
closed
9 months ago
2
Fixed multi file and wildcard args
#218
SinanAkkoyun
closed
9 months ago
13
add QuiP quant support
#217
waters222
opened
10 months ago
3
About YiTokenizer errors
#216
redwoodzero0
closed
9 months ago
2
ExLlamaV2Cache_8bit does not work with multiple_caches.py example
#215
lopuhin
closed
10 months ago
2
Error quantizing models on recent commit
#213
brucethemoose
closed
10 months ago
4
How to clear cache / reset the cache so that model doesnt remember the response earlier?
#212
Rajmehta123
closed
10 months ago
2
(Oobabooga) Can't load GPTQ models anymore with ExLlama-V2 0.0.10
#211
Daviljoe193
closed
10 months ago
2
Allow padding data instead of concatenating when generating calibration dataset
#209
ivsanro1
closed
2 weeks ago
1
Adding return_lowest_perplexity
#206
ziadloo
opened
10 months ago
0
Added draft model rope scale to chat example
#204
SinanAkkoyun
closed
10 months ago
0
Difference between gemm_half_q_half_gptq_kernel and gemm_half_q_half_kernel
#202
frankxyy
closed
10 months ago
0
Quantization error "Warning: Applied additional damping" and "Hessian"
#201
yamosin
closed
10 months ago
2
Generating a batch of different propmpt sizes, the shorter prompts tend to suffer
#200
ziadloo
closed
3 months ago
7
flash attention does nothing
#199
Tedy50
closed
5 months ago
5
Support for no_repeat_ngram_size
#198
anujnayyar1
closed
2 weeks ago
1
Support GPT2 tokenizer for CausalLM 72b
#196
CyberTimon
closed
10 months ago
6
Fix Unicode errors when loading files
#195
bdashore3
closed
10 months ago
0
Previous
Next