turboderp exllamav2 issues

turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

MIT License

3.56k stars 273 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Load models' tokenizer.json file with utf-8 encoding.

#194 VldmrB closed 10 months ago
1
support for awq

#193 frankxyy closed 4 months ago
0
Added DeepSeek Coder Instruct to chat example

#192 SinanAkkoyun closed 10 months ago
0
Fixed code block syntax highlighting for long code

#191 SinanAkkoyun closed 10 months ago
1
Chat code syntax highlighter has problems when out of view

#190 SinanAkkoyun closed 10 months ago
4
Add Huggingface tokenzier support

#189 DOGEwbx closed 10 months ago
13
Support for Huggingface Fast Tokenizers

#188 bibekyess closed 10 months ago
13
Support LookaheadDecoding?

#187 yhyu13 closed 10 months ago
2
Batched generations are very similar

#186 anujnayyar1 closed 10 months ago
2
Upcasting all the calculations.

#185 Ph0rk0z opened 10 months ago
3
Any plans for Mac/Metal support?

#184 aikitoria closed 9 months ago
3
How do I deal with custom EOS?

#183 teknium1 closed 10 months ago
1
fix infer websocket_actions.py

#182 Kerushii closed 10 months ago
0
Different groupsizes for different bitwidths?

#181 dalistarh closed 10 months ago
4
[Question] `h_gemm` Implementation

#180 jeromeku closed 9 months ago
2
convert.py ending with "Killed" at lm_head layer when converting zephyr-7b

#179 Christopheraburns closed 4 months ago
1
QuIP#, New SOTA(?) 2-bit Quantization Method

#176 brucethemoose opened 10 months ago
6
Is there a colab or something which shows all the code necessary to setup the project?

#175 slooi closed 10 months ago
1
EXL2 Proposal: Llamacpp Allocation Heuristics and Specialisation Degree Feature

#174 aljungberg closed 4 months ago
12
Windows 10, Oobabooga, inability to load some models

#173 homeworkace closed 10 months ago
3
Improve model load performance

#171 abstractdescutcheon closed 8 months ago
4
Lora doesn't impact the model outputs

#169 matankley closed 9 months ago
1
AssertionError: Insufficient space in device allocation

#168 Double-bear closed 9 months ago
2
Update base.py to remove useless statement

#167 MilesQLi closed 9 months ago
0
OOM with gpu_split_auto, must specify split manually

#166 cikkle closed 4 months ago
8
Performance with speculative decoding is slightly worse than without at full context

#165 cikkle closed 9 months ago
3
dynamic enable special tokens

#164 wangyu1997 closed 8 months ago
1
How can you clear the cache of the exllamav2?

#163 Rajmehta123 closed 5 months ago
2
CPU offloading?

#162 oobabooga closed 10 months ago
1
self.rms_norm_eps = read_config["rms_norm_eps"] KeyError: 'rms_norm_eps' (Qwen model not supported)

#160 tutu329 closed 9 months ago
8
No module named 'exllamav2_ext' when loading a model

#159 ParisNeo closed 5 months ago
11
exllamav2 Installation Error : [Errno 20] Not a directory: 'hipconfig'.

#158 watchstep closed 10 months ago
6
Tweak to multiple cache example

#157 dvianisoho closed 11 months ago
2
Got error when converting Llama2 70b

#156 GolemXlV closed 11 months ago
2
Yi models generates some languages in an incorrect encoding

#155 samuelazran closed 11 months ago
2
Simplify HIP compatibility

#154 ardfork closed 11 months ago
8
Very poor performance when VRAM is nearly full (inconsistent)

#153 QM60 closed 6 months ago
13
Error When Converting Safetensor to exl2

#152 Noobville1345 closed 9 months ago
3
70B Quant Potential Issue

#151 azureblackprime closed 11 months ago
2
Support for 01-ai/Yi series of models?

#150 anujnayyar1 closed 11 months ago
2
Suggestion: allow different context lengths for draft model and main model in speculative sampling

#149 Antollo closed 11 months ago
3
CU12+ appears to be unsupported?

#148 SebJansen closed 9 months ago
15
response id and infer modification

#146 Kerushii closed 10 months ago
1
Implement Efficient Streaming Language Models with Attention Sinks

#145 BigArty closed 6 months ago
1
rocm5.2.0 torch run test_inference.py error undefined symbol: hipblasHgemm

#144 Rane2021 closed 9 months ago
2
[ROCM] Gibberish on wave64 devices

#143 IMbackK closed 11 months ago
1
The turboderp 70B quants on HuggingFace are broken

#142 11415142513152119 closed 11 months ago
2
Suggestion: Implement MinP

#140 Chanka0 closed 11 months ago
4
how many time should it spend to convert llama2 70b to 2bit? on single V10032GB

#139 zlh1992 closed 9 months ago
2
Exl2 model fails to run -- RuntimeError: Insufficient size of temp_dq buffer

#138 truenorth8 closed 11 months ago
2

Previous Next