turboderp exllamav2 issues

turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

MIT License

3.23k stars 238 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Any examples on long inputs on rope scaled model?

#248 sreeprasannar closed 3 weeks ago
1
Quantizing goliath120b @ 3bpw : calibration perplexity (quant): 2745.1239

#246 alexconstant9108 closed 6 months ago
8
Mistral fails/garbage at context > 8192, transformers works fine

#245 matatonic closed 6 months ago
2
Feature request: EAGLE

#244 vt404v2 closed 3 months ago
1
Batched flash attention

#243 fahadh4ilyas opened 6 months ago
2
example_batchprocessing.py

#242 Kerushii closed 3 months ago
0
feat: frequency and presence penalty

#241 AlpinDale closed 6 months ago
0
Batched generation with flash attention

#240 fahadh4ilyas closed 6 months ago
2
feat: add top-A sampling

#239 AlpinDale closed 6 months ago
0
Return token probabilities in generator.stream()

#238 ivsanro1 closed 5 months ago
3
add flash attention feature to different seqlen batch

#237 fahadh4ilyas closed 6 months ago
1
What is "an improved exllamav2 quant method"?

#236 yamosin closed 6 months ago
2
Attempted to quant a custom MoE model, Plap-8x13b, and get an error.

#235 NiriProject closed 3 weeks ago
1
TypeError: ExLlamaV2Tokenizer.encode() got an unexpected keyword argument 'return_offsets'

#234 Rajmehta123 closed 6 months ago
1
[ROCM] [GFX1030] no output

#233 IMbackK closed 3 weeks ago
2
The answers from model with batching return different answer even with the same input

#232 fahadh4ilyas closed 3 weeks ago
15
Fix encoder in MMLU benchmark

#231 dvdtoth closed 6 months ago
0
How to use gpu_split in inference.py example

#230 irthomasthomas opened 6 months ago
0
Installation instructions "Method 1" dysfunctional

#229 takosalad opened 6 months ago
1
Merge experimental

#228 turboderp closed 6 months ago
0
Merge changes from master

#227 turboderp closed 6 months ago
0
cache.clone() is not creating a copy of the cache

#226 hidoba closed 6 months ago
1
CPU offloading

#225 bibidentuhanoi closed 6 months ago
2
Stop conditions and exclude prompt for Base generator

#224 SinanAkkoyun opened 6 months ago
0
Mixtral

#223 nivibilla closed 6 months ago
13
some GPTQ models can not be loaded anymore

#222 sammyf closed 6 months ago
3
Support DragonFox style "BaNnbAnN"

#221 Kerushii closed 6 months ago
1
Is seed actually used?

#220 richardburleigh closed 7 months ago
1
DeepSeek: ValueError: bytes must be in range(0, 256)

#219 SinanAkkoyun closed 6 months ago
2
Fixed multi file and wildcard args

#218 SinanAkkoyun closed 6 months ago
13
add QuiP quant support

#217 waters222 opened 7 months ago
3
About YiTokenizer errors

#216 redwoodzero0 closed 6 months ago
2
ExLlamaV2Cache_8bit does not work with multiple_caches.py example

#215 lopuhin closed 6 months ago
2
Error quantizing models on recent commit

#213 brucethemoose closed 7 months ago
4
How to clear cache / reset the cache so that model doesnt remember the response earlier?

#212 Rajmehta123 closed 6 months ago
2
(Oobabooga) Can't load GPTQ models anymore with ExLlama-V2 0.0.10

#211 Daviljoe193 closed 7 months ago
2
Allow padding data instead of concatenating when generating calibration dataset

#209 ivsanro1 opened 7 months ago
0
Adding return_lowest_perplexity

#206 ziadloo opened 7 months ago
0
Added draft model rope scale to chat example

#204 SinanAkkoyun closed 7 months ago
0
Difference between gemm_half_q_half_gptq_kernel and gemm_half_q_half_kernel

#202 frankxyy closed 7 months ago
0
Quantization error "Warning: Applied additional damping" and "Hessian"

#201 yamosin closed 7 months ago
2
Generating a batch of different propmpt sizes, the shorter prompts tend to suffer

#200 ziadloo closed 3 weeks ago
7
flash attention does nothing

#199 Tedy50 closed 2 months ago
5
Support for no_repeat_ngram_size

#198 anujnayyar1 opened 7 months ago
0
Support GPT2 tokenizer for CausalLM 72b

#196 CyberTimon closed 7 months ago
6
Fix Unicode errors when loading files

#195 bdashore3 closed 7 months ago
0
Load models' tokenizer.json file with utf-8 encoding.

#194 VldmrB closed 7 months ago
1
support for awq

#193 frankxyy closed 1 month ago
0
Added DeepSeek Coder Instruct to chat example

#192 SinanAkkoyun closed 7 months ago
0
Fixed code block syntax highlighting for long code

#191 SinanAkkoyun closed 7 months ago
1

Previous Next