turboderp exllama issues

turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

MIT License

2.77k stars 220 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Run on CPU without AVX2

#315 ZanMax opened 7 months ago
3
piece id is out of range

#314 chethanwiz opened 7 months ago
3
ValueError: Unrecognized layer: lm_head.q_groups on a new install

#313 Fuckingnameless closed 8 months ago
2
ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/home/exllama/env/lib/python3.11/site-packages/sentencepiece' Check the permissions.

#312 Fuckingnameless closed 8 months ago
0
updates since 0.0.11 causing code to not compile on Ubuntu (23.04, 23.10) with AMD HIP / ROCm ( 5.6 5.7 6.0 ... )

#311 nktice closed 2 months ago
3
When will the bfloat16 type of GPTQ algorithm be supported?

#310 Kelang-Tian opened 11 months ago
0
Does it support safetytensor formate?>

#309 lucasjinreal opened 1 year ago
0
Error when using Beam Search

#308 bibekyess opened 1 year ago
0
Occasionally RuntimeError

#307 leegohi04517 opened 1 year ago
0
Using Exllama backend requires all the modules to be on GPU - how?

#306 tigerinus opened 1 year ago
1
Issue with How --gpu_split / -gs argument works.

#305 JustinKunzi closed 1 year ago
2
does the benchmark support batch size>1?

#304 deltaguo closed 1 year ago
1
test_inference.py : AttributeError: module 'exllamav2_ext' has no attribute 'rms_norm'

#302 DFuller134 closed 1 year ago
1
test_benchmark_inference.py broken?

#301 11415142513152119 closed 1 year ago
1
llama_cpp_python_cuda is not a supported wheel on this platform

#300 arif599 closed 1 year ago
1
Changing hyper-parameters after initilization without reloading weights from disk.

#299 kmccleary3301 opened 1 year ago
0
finetuned Llama-2-7B-32K-Instruct-GPTQ only returns '\n'

#298 Napuh closed 6 months ago
1
Why can't the llama2 model output EOS id?

#295 pangr closed 1 year ago
4
doesn't use CUDA_HOME?

#293 j2l opened 1 year ago
0
list index out of range

#292 j2l closed 1 year ago
1
OSError: CUDA_HOME environment variable is not set.

#291 jamesbraza opened 1 year ago
8
CodeLLaMA + LoRA: RuntimeError: CUDA error: an illegal memory access was encountered

#290 juanps90 opened 1 year ago
3
GPU Inference from IPython

#289 Rajmehta123 opened 1 year ago
0
followed instructions with error

#288 hiqsociety opened 1 year ago
2
is it too much of me to ask for an MPI option like llama.cpp?

#286 hiqsociety closed 1 year ago
5
exception about replacing the op q4_matmul_kernel

#285 deltaguo closed 1 year ago
2
phi-1.5 support?

#284 SinanAkkoyun closed 1 year ago
5
multi stoptoken

#283 Kerushii closed 1 year ago
0
Multi-GPU issues

#281 nktice opened 1 year ago
9
Support for Baichuan2 models

#280 bernardx opened 1 year ago
1
Progress on the rewrite for older cards (Like the P40)

#279 TimyIsCool opened 1 year ago
1
LoRA appears to not be used after the first run

#278 technillogue closed 1 year ago
1
Is Tesla T4 supported?

#277 ivsanro1 closed 1 year ago
2
Multi-GPU inference?

#276 mbhenaff closed 1 year ago
1
Optimize q4_matmul

#275 QuarticCat closed 1 year ago
21
remove tokens that exceed the max_seq_len

#274 p11188536 opened 1 year ago
1
Completion abruptly stopped - RuntimeError: CUDA error: an illegal memory access was encountered

#273 Thireus opened 1 year ago
1
YaRN Support

#272 grimulkan opened 1 year ago
8
Codelama support

#270 ParisNeo opened 1 year ago
11
Running Llama2 on multiple GPUs outputs gibberish

#269 mirth closed 1 year ago
2
Support for AMD ROCM

#268 yehowshuaradialrad opened 1 year ago
1
Is it possible and efficient if load layer on demand?

#267 fahadh4ilyas opened 1 year ago
2
Speed on A100

#266 Ber666 opened 1 year ago
4
Optimize and extend ws example for chatborts

#265 Kerushii closed 1 year ago
0
Any blogs on the project?

#264 qizzzh opened 1 year ago
0
Performance issues

#263 bryanhpchiang opened 1 year ago
3
RoPE Frequency Base and Frequency Scale Support

#262 ChrisCates opened 1 year ago
3
Codellama 16K context length?

#261 ShahZ181 opened 1 year ago
3
Codellama support

#260 lucasjinreal opened 1 year ago
10
Cache size below max_seq_len?

#259 fahadh4ilyas closed 1 year ago
2