issues
search
lucidrains
/
PaLM-rlhf-pytorch
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
MIT License
7.7k
stars
668
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
A bug in the implementation of the top-p sampling
#60
allblueJT
opened
1 month ago
0
Is there any documentation to train this on my own data ?
#59
gauravgandhi1315
opened
8 months ago
0
How to use lora?
#58
xiaoguzai
opened
8 months ago
0
Should critic's input be prompt only?
#57
ginward
opened
11 months ago
0
Possible incorrect creation of Rotary Embeddinigs
#56
AndyBarcia
closed
1 year ago
1
Update train.py
#55
the-lord-nothing
closed
1 year ago
0
Flash Attention 2
#54
conceptofmind
closed
4 months ago
0
implement an argument to directly set ff_inner_dim
#52
chris-ha458
opened
1 year ago
3
I looked at the llama source code and there is an intermedie layer
#51
wac81
opened
1 year ago
0
Create 那个
#50
userodk
closed
1 year ago
0
Model Name
#49
conceptofmind
closed
1 year ago
3
memory-efficient attention is default opened? if i dont use flash attn
#48
wac81
opened
1 year ago
3
speed up with flash attn in A6000?
#47
wac81
closed
1 year ago
2
norm.gamma not used during backprop
#46
conceptofmind
closed
1 year ago
2
i use other params with palm, but got error
#45
wac81
closed
1 year ago
4
Column and Row Parallel Linear for Apex Tensor Parallel
#44
conceptofmind
closed
1 year ago
1
Calculating the kl loss seems has a mistake.
#43
Nightbringers
closed
1 year ago
1
Reason for using pooled critic embedding instead of the last embedding for value head
#42
gblackout
closed
1 year ago
3
Confusion about KL divergence calculation for human feedback policies
#41
dwyzzy
closed
1 year ago
13
Add PyTorch 2.0 Flash Attention
#40
conceptofmind
closed
1 year ago
17
mask raised error
#39
gongel
closed
1 year ago
2
KL divergence loss
#38
taynoel
closed
1 year ago
1
train your reward model issue
#37
wac81
opened
1 year ago
1
Can not train the model using PyTorch version 2?
#36
linhduongtuan
closed
1 year ago
1
Value function
#35
tonylin52
opened
1 year ago
0
test chat
#34
strint
closed
1 year ago
0
Is it possible to train this ai using open-assistant or vice versa?
#33
qwertystars
closed
1 year ago
1
Can we exploiting AGI ability of chatGPT ?
#32
youkpan
closed
1 year ago
0
Is this shift right for the action logits?
#31
kisseternity
closed
1 year ago
4
Do you need cuda for this?
#30
beew
closed
1 year ago
1
Are there some pictures that describe PaLM architecture?
#29
guotong1988
closed
1 year ago
1
value function input
#28
kkissmart
closed
1 year ago
1
KL_div/ratio on policy
#26
kkissmart
closed
1 year ago
0
Is it possible to replace PaLM with other huggingface pretrained language model?
#24
noanti
opened
1 year ago
2
✨ 😅 Is possibale to use the ChatGPT of OpenAI to train this ChatGPT?
#23
Yonv1943
opened
1 year ago
8
The loss function of reward model.
#22
huzechuan
opened
1 year ago
2
A few questions on training
#21
aakashrkumar
opened
1 year ago
3
How to fine-tune and train on my own data?
#20
rbhatia46
opened
1 year ago
0
Training the reward model
#19
farhad-abdi
closed
1 year ago
8
PaLM-rlhf-pytorch Roadmap
#18
HappyPony
closed
1 year ago
4
Help with computational power
#17
byteunix
closed
1 year ago
4
Is it possible to release a code based on jax?
#16
sglucas
closed
1 year ago
7
Simple Web Interface
#15
conceptofmind
closed
1 year ago
2
Why the value calculate in generate and learn use different mask?
#14
Nightbringers
closed
1 year ago
1
Palm
#13
Phob3tor
closed
1 year ago
0
Can we just replace PPO+RLHF with a preference models thats basically a transformer encoder + sigmoid model, trained with BCE. And during finetuning perform a reward maximization by just making the reward model predict 1s?
#12
ssintelli
closed
1 year ago
5
I'm dumb
#11
cardonasMind
closed
1 year ago
1
Bug fix: Correct function call in RewardModel->finetune_parameters
#10
QasimWani
closed
1 year ago
2
Can I train a model on my own data?
#9
sveisa
closed
1 year ago
1
Noob question: How can I use this model for inference?
#8
PrasoonPratham
closed
1 year ago
1
Next