issues
search
juncongmoo
/
pyllama
LLaMA: Open and Efficient Foundation Language Models
GNU General Public License v3.0
2.8k
stars
312
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Killed
#62
javierp183
opened
1 year ago
6
Any way to infer a quantized model on multi GPUs?
#61
Imagium719
opened
1 year ago
1
Quantize Original LLaMA Model Files
#60
htcml
opened
1 year ago
3
Let it run under WSL
#59
daniel-kukiela
closed
1 year ago
0
Quantization with "groupsize" makes the results completely wrong.
#58
daniel-kukiela
opened
1 year ago
8
already quantize to 4bit and get the model pyllama-7B4b.pt,but can not run in RTX3080. report torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 10.00 GiB total capacity; 9.24 GiB already allocated;
#57
elven2016
opened
1 year ago
2
Downloading get stuck in infinite loop
#56
jarimustonen
opened
1 year ago
13
Error trying Quantize 7B model to 8-bit
#55
guoti777
closed
1 year ago
2
Docker install
#53
mgpai22
opened
1 year ago
0
Meaningless Prediction in 13B 2bit
#52
axenov
opened
1 year ago
3
error when installing
#51
zzzgit
closed
1 year ago
1
Error Downloading Models from Community on Winodws
#50
mmortazavi
opened
1 year ago
5
add suggestion for quantization and some bug fixes
#49
juncongmoo
closed
1 year ago
0
error when running model for inference: ModuleNotFoundError: No module named 'transformers.models.llama'
#48
AbFynd
closed
1 year ago
2
pyllama/downloads returns empty folders
#47
flyjgh
opened
1 year ago
34
How can I input prompt when I use multi GPU?
#46
liydxl
opened
1 year ago
1
Share your evaluate result
#45
jeff3071
opened
1 year ago
3
fix argument in convert_llama
#44
a1ex90
closed
1 year ago
0
AttributeError: module 'numpy' has no attribute 'array'
#43
jameswan
opened
1 year ago
0
watch downloading speed and restart downloading if it drops to very low
#42
gmlove
closed
1 year ago
0
Error trying Quantize 7B model to 2-bit
#41
willintonmb
opened
1 year ago
5
Quantize 7B model to 8-bit --> "Killed"
#40
hex4def6
closed
1 year ago
1
"KeyError: 'llama'"
#39
DirtyKnightForVi
closed
1 year ago
0
"torch.cuda.OutOfMemoryError: CUDA out of memory" when I'm *not* out of memory
#38
LoganDark
opened
1 year ago
3
ModuleNotFoundError: No module named 'quant_cuda'
#37
AceBeaker2
opened
1 year ago
15
Unkown cuda error
#36
AceBeaker2
closed
1 year ago
3
ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.
#35
ilovedbsql
closed
1 year ago
10
ModuleNotFoundError: No module named 'llama.hf'
#34
vetka925
closed
1 year ago
4
No module named "transformers" error
#33
SimoGiuffrida
closed
1 year ago
1
example.py FAILED
#32
yangzhipeng1108
closed
1 year ago
1
Model mismatch for 13B
#31
BOB603049648
opened
1 year ago
3
ModuleNotFoundError: No module named 'quant_cuda'
#30
WeissAzura
closed
1 year ago
3
Download takes forever
#29
puyuanliu
closed
1 year ago
2
Model does not split for 65B
#28
YixinSong-e
opened
1 year ago
5
How to run llama_quant without downloading models from huggingface ?
#27
B2F
opened
1 year ago
1
Error when download models
#26
paulocoutinhox
opened
1 year ago
5
world size assertionerror
#25
sharlec
closed
1 year ago
6
M1 inference
#24
zmactep
opened
1 year ago
1
multiple GPU support
#23
mldevorg
closed
1 year ago
0
Execuse me, How to use chat mode?
#22
baifachuan
closed
1 year ago
0
convert
#21
mldevorg
closed
1 year ago
0
add simple input loop to inference.py
#20
lucemia
closed
1 year ago
0
Bug fix3
#19
juncongmoo
closed
1 year ago
0
fix a bug
#18
mldevorg
closed
1 year ago
0
fix document
#17
mldevorg
closed
1 year ago
0
add quant and download info
#16
juncongmoo
closed
1 year ago
0
Vanilla pytorch LLaMA implementation
#15
galatolofederico
closed
1 year ago
3
Struggle with training LLaMA with a single GPU using both PT v1 and v2
#14
linhduongtuan
closed
1 year ago
4
Docker Playground With LLaMA And PyLLaMA
#13
soulteary
closed
1 year ago
1
Sorry,I can't run
#12
KingingWang
opened
1 year ago
8
Previous
Next