issues
search
turboderp
/
exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.77k
stars
220
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Run on CPU without AVX2
#315
ZanMax
opened
7 months ago
3
piece id is out of range
#314
chethanwiz
opened
7 months ago
3
ValueError: Unrecognized layer: lm_head.q_groups on a new install
#313
Fuckingnameless
closed
8 months ago
2
ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/home/exllama/env/lib/python3.11/site-packages/sentencepiece' Check the permissions.
#312
Fuckingnameless
closed
8 months ago
0
updates since 0.0.11 causing code to not compile on Ubuntu (23.04, 23.10) with AMD HIP / ROCm ( 5.6 5.7 6.0 ... )
#311
nktice
closed
2 months ago
3
When will the bfloat16 type of GPTQ algorithm be supported?
#310
Kelang-Tian
opened
11 months ago
0
Does it support safetytensor formate?>
#309
lucasjinreal
opened
1 year ago
0
Error when using Beam Search
#308
bibekyess
opened
1 year ago
0
Occasionally RuntimeError
#307
leegohi04517
opened
1 year ago
0
Using Exllama backend requires all the modules to be on GPU - how?
#306
tigerinus
opened
1 year ago
1
Issue with How --gpu_split / -gs argument works.
#305
JustinKunzi
closed
1 year ago
2
does the benchmark support batch size>1?
#304
deltaguo
closed
1 year ago
1
test_inference.py : AttributeError: module 'exllamav2_ext' has no attribute 'rms_norm'
#302
DFuller134
closed
1 year ago
1
test_benchmark_inference.py broken?
#301
11415142513152119
closed
1 year ago
1
llama_cpp_python_cuda is not a supported wheel on this platform
#300
arif599
closed
1 year ago
1
Changing hyper-parameters after initilization without reloading weights from disk.
#299
kmccleary3301
opened
1 year ago
0
finetuned Llama-2-7B-32K-Instruct-GPTQ only returns '\n'
#298
Napuh
closed
6 months ago
1
Why can't the llama2 model output EOS id?
#295
pangr
closed
1 year ago
4
doesn't use CUDA_HOME?
#293
j2l
opened
1 year ago
0
list index out of range
#292
j2l
closed
1 year ago
1
OSError: CUDA_HOME environment variable is not set.
#291
jamesbraza
opened
1 year ago
8
CodeLLaMA + LoRA: RuntimeError: CUDA error: an illegal memory access was encountered
#290
juanps90
opened
1 year ago
3
GPU Inference from IPython
#289
Rajmehta123
opened
1 year ago
0
followed instructions with error
#288
hiqsociety
opened
1 year ago
2
is it too much of me to ask for an MPI option like llama.cpp?
#286
hiqsociety
closed
1 year ago
5
exception about replacing the op q4_matmul_kernel
#285
deltaguo
closed
1 year ago
2
phi-1.5 support?
#284
SinanAkkoyun
closed
1 year ago
5
multi stoptoken
#283
Kerushii
closed
1 year ago
0
Multi-GPU issues
#281
nktice
opened
1 year ago
9
Support for Baichuan2 models
#280
bernardx
opened
1 year ago
1
Progress on the rewrite for older cards (Like the P40)
#279
TimyIsCool
opened
1 year ago
1
LoRA appears to not be used after the first run
#278
technillogue
closed
1 year ago
1
Is Tesla T4 supported?
#277
ivsanro1
closed
1 year ago
2
Multi-GPU inference?
#276
mbhenaff
closed
1 year ago
1
Optimize q4_matmul
#275
QuarticCat
closed
1 year ago
21
remove tokens that exceed the max_seq_len
#274
p11188536
opened
1 year ago
1
Completion abruptly stopped - RuntimeError: CUDA error: an illegal memory access was encountered
#273
Thireus
opened
1 year ago
1
YaRN Support
#272
grimulkan
opened
1 year ago
8
Codelama support
#270
ParisNeo
opened
1 year ago
11
Running Llama2 on multiple GPUs outputs gibberish
#269
mirth
closed
1 year ago
2
Support for AMD ROCM
#268
yehowshuaradialrad
opened
1 year ago
1
Is it possible and efficient if load layer on demand?
#267
fahadh4ilyas
opened
1 year ago
2
Speed on A100
#266
Ber666
opened
1 year ago
4
Optimize and extend ws example for chatborts
#265
Kerushii
closed
1 year ago
0
Any blogs on the project?
#264
qizzzh
opened
1 year ago
0
Performance issues
#263
bryanhpchiang
opened
1 year ago
3
RoPE Frequency Base and Frequency Scale Support
#262
ChrisCates
opened
1 year ago
3
Codellama 16K context length?
#261
ShahZ181
opened
1 year ago
3
Codellama support
#260
lucasjinreal
opened
1 year ago
10
Cache size below max_seq_len?
#259
fahadh4ilyas
closed
1 year ago
2
Next