issues
search
tloen
/
llama-int8
Quantized inference code for LLaMA models
GNU General Public License v3.0
1.05k
stars
105
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Does this support llama2 as well?
#21
YaoJiayi
opened
9 months ago
0
Producing nan Tensors
#20
Bryan-Lavender
opened
1 year ago
0
CUDA out of memory
#19
fengyh3
closed
1 year ago
0
65B on multiple GPUs : CUDA out of memory with 4 x GPU RTX A5000 (24GB) / 96GB in total
#18
scampion
opened
1 year ago
3
LLaMA 13B works on a single RTX 4080 16GB
#17
kcchu
opened
1 year ago
1
Further detail needed - installing bitsandbytes from source
#16
chrisbward
opened
1 year ago
1
Issue for bitsandbytes /// NameError: name 'cuda_setup' is not defined. Did you mean: 'CUDASetup'?
#15
kskim-phd
closed
1 year ago
1
feat: webui for llama-int8
#14
soulteary
opened
1 year ago
0
Assign the parameters of each layer to multiple CUDA devices automatically.
#13
lipan6461188
opened
1 year ago
0
Getting error on generation in Windows
#12
elephantpanda
opened
1 year ago
4
Can 65B run on 4*32G GPU?
#11
zhongtao93
opened
1 year ago
0
Is it possible to save the smaller weights so it doesn't have to convert them each time?
#10
spullara
opened
1 year ago
0
Systematic comparison of original models to int8 inferencing
#9
innokean
opened
1 year ago
1
When a single A100 80G ,memory is about 96G,Error loading 65B
#8
dpyneo
opened
1 year ago
3
RTX4090 CUDA out of memory.
#7
WuNein
closed
1 year ago
3
Any chance to share quantized int8 7B and 13B models?
#6
progressionnetwork
opened
1 year ago
0
Does 8GB able to run smallest llama model?
#5
lucasjinreal
opened
1 year ago
4
Tracking issue for Mac support
#4
pannous
opened
1 year ago
3
Reduce RAM consumption on loading
#3
pamparamm
closed
1 year ago
0
13B - load is successful on T4, but forward pass fails
#2
deep-diver
opened
1 year ago
0
On branch add_save_load
#1
calhounpaul
closed
1 year ago
3