issues
search
turboderp
/
exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.66k
stars
214
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Run on CPU without AVX2
#315
ZanMax
opened
2 months ago
3
piece id is out of range
#314
chethanwiz
opened
2 months ago
3
ValueError: Unrecognized layer: lm_head.q_groups on a new install
#313
Fuckingnameless
closed
4 months ago
2
ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/home/exllama/env/lib/python3.11/site-packages/sentencepiece' Check the permissions.
#312
Fuckingnameless
closed
4 months ago
0
updates since 0.0.11 causing code to not compile on Ubuntu (23.04, 23.10) with AMD HIP / ROCm ( 5.6 5.7 6.0 ... )
#311
nktice
opened
5 months ago
2
When will the bfloat16 type of GPTQ algorithm be supported?
#310
Kelang-Tian
opened
6 months ago
0
Does it support safetytensor formate?>
#309
lucasjinreal
opened
7 months ago
0
Error when using Beam Search
#308
bibekyess
opened
7 months ago
0
Occasionally RuntimeError
#307
leegohi04517
opened
7 months ago
0
Using Exllama backend requires all the modules to be on GPU - how?
#306
tigerinus
opened
8 months ago
1
Issue with How --gpu_split / -gs argument works.
#305
JustinKunzi
closed
8 months ago
2
does the benchmark support batch size>1?
#304
deltaguo
closed
8 months ago
1
test_inference.py : AttributeError: module 'exllamav2_ext' has no attribute 'rms_norm'
#302
DFuller134
closed
9 months ago
1
test_benchmark_inference.py broken?
#301
11415142513152119
closed
9 months ago
1
llama_cpp_python_cuda is not a supported wheel on this platform
#300
arif599
closed
9 months ago
1
Changing hyper-parameters after initilization without reloading weights from disk.
#299
kmccleary3301
opened
9 months ago
0
finetuned Llama-2-7B-32K-Instruct-GPTQ only returns '\n'
#298
Napuh
closed
1 month ago
1
Why can't the llama2 model output EOS id?
#295
pangr
closed
9 months ago
4
doesn't use CUDA_HOME?
#293
j2l
opened
9 months ago
0
list index out of range
#292
j2l
closed
9 months ago
1
OSError: CUDA_HOME environment variable is not set.
#291
jamesbraza
opened
9 months ago
8
CodeLLaMA + LoRA: RuntimeError: CUDA error: an illegal memory access was encountered
#290
juanps90
opened
9 months ago
3
GPU Inference from IPython
#289
Rajmehta123
opened
9 months ago
0
followed instructions with error
#288
hiqsociety
opened
9 months ago
2
is it too much of me to ask for an MPI option like llama.cpp?
#286
hiqsociety
closed
9 months ago
5
exception about replacing the op q4_matmul_kernel
#285
deltaguo
closed
9 months ago
2
phi-1.5 support?
#284
SinanAkkoyun
closed
9 months ago
5
multi stoptoken
#283
Kerushii
closed
9 months ago
0
Multi-GPU issues
#281
nktice
opened
9 months ago
9
Support for Baichuan2 models
#280
bernardx
opened
9 months ago
1
Progress on the rewrite for older cards (Like the P40)
#279
TimyIsCool
opened
9 months ago
1
LoRA appears to not be used after the first run
#278
technillogue
closed
9 months ago
1
Is Tesla T4 supported?
#277
ivsanro1
closed
9 months ago
2
Multi-GPU inference?
#276
mbhenaff
closed
10 months ago
1
Optimize q4_matmul
#275
QuarticCat
closed
9 months ago
21
remove tokens that exceed the max_seq_len
#274
p11188536
opened
10 months ago
1
Completion abruptly stopped - RuntimeError: CUDA error: an illegal memory access was encountered
#273
Thireus
opened
10 months ago
1
YaRN Support
#272
grimulkan
opened
10 months ago
8
Codelama support
#270
ParisNeo
opened
10 months ago
11
Running Llama2 on multiple GPUs outputs gibberish
#269
mirth
closed
10 months ago
2
Support for AMD ROCM
#268
yehowshuaradialrad
opened
10 months ago
1
Is it possible and efficient if load layer on demand?
#267
fahadh4ilyas
opened
10 months ago
2
Speed on A100
#266
Ber666
opened
10 months ago
4
Optimize and extend ws example for chatborts
#265
Kerushii
closed
10 months ago
0
Any blogs on the project?
#264
qizzzh
opened
10 months ago
0
Performance issues
#263
bryanhpchiang
opened
10 months ago
3
RoPE Frequency Base and Frequency Scale Support
#262
ChrisCates
opened
10 months ago
3
Codellama 16K context length?
#261
ShahZ181
opened
10 months ago
3
Codellama support
#260
lucasjinreal
opened
10 months ago
10
Cache size below max_seq_len?
#259
fahadh4ilyas
closed
10 months ago
2
Next