lucidrains PaLM-rlhf-pytorch issues

lucidrains / PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

MIT License

7.7k stars 668 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

A bug in the implementation of the top-p sampling

#60 allblueJT opened 1 month ago
0
Is there any documentation to train this on my own data ?

#59 gauravgandhi1315 opened 8 months ago
0
How to use lora?

#58 xiaoguzai opened 8 months ago
0
Should critic's input be prompt only?

#57 ginward opened 11 months ago
0
Possible incorrect creation of Rotary Embeddinigs

#56 AndyBarcia closed 1 year ago
1
Update train.py

#55 the-lord-nothing closed 1 year ago
0
Flash Attention 2

#54 conceptofmind closed 4 months ago
0
implement an argument to directly set ff_inner_dim

#52 chris-ha458 opened 1 year ago
3
I looked at the llama source code and there is an intermedie layer

#51 wac81 opened 1 year ago
0
Create 那个

#50 userodk closed 1 year ago
0
Model Name

#49 conceptofmind closed 1 year ago
3
memory-efficient attention is default opened? if i dont use flash attn

#48 wac81 opened 1 year ago
3
speed up with flash attn in A6000?

#47 wac81 closed 1 year ago
2
norm.gamma not used during backprop

#46 conceptofmind closed 1 year ago
2
i use other params with palm, but got error

#45 wac81 closed 1 year ago
4
Column and Row Parallel Linear for Apex Tensor Parallel

#44 conceptofmind closed 1 year ago
1
Calculating the kl loss seems has a mistake.

#43 Nightbringers closed 1 year ago
1
Reason for using pooled critic embedding instead of the last embedding for value head

#42 gblackout closed 1 year ago
3
Confusion about KL divergence calculation for human feedback policies

#41 dwyzzy closed 1 year ago
13
Add PyTorch 2.0 Flash Attention

#40 conceptofmind closed 1 year ago
17
mask raised error

#39 gongel closed 1 year ago
2
KL divergence loss

#38 taynoel closed 1 year ago
1
train your reward model issue

#37 wac81 opened 1 year ago
1
Can not train the model using PyTorch version 2?

#36 linhduongtuan closed 1 year ago
1
Value function

#35 tonylin52 opened 1 year ago
0
test chat

#34 strint closed 1 year ago
0
Is it possible to train this ai using open-assistant or vice versa?

#33 qwertystars closed 1 year ago
1
Can we exploiting AGI ability of chatGPT ?

#32 youkpan closed 1 year ago
0
Is this shift right for the action logits?

#31 kisseternity closed 1 year ago
4
Do you need cuda for this?

#30 beew closed 1 year ago
1
Are there some pictures that describe PaLM architecture?

#29 guotong1988 closed 1 year ago
1
value function input

#28 kkissmart closed 1 year ago
1
KL_div/ratio on policy

#26 kkissmart closed 1 year ago
0
Is it possible to replace PaLM with other huggingface pretrained language model?

#24 noanti opened 1 year ago
2
✨ 😅 Is possibale to use the ChatGPT of OpenAI to train this ChatGPT?

#23 Yonv1943 opened 1 year ago
8
The loss function of reward model.

#22 huzechuan opened 1 year ago
2
A few questions on training

#21 aakashrkumar opened 1 year ago
3
How to fine-tune and train on my own data?

#20 rbhatia46 opened 1 year ago
0
Training the reward model

#19 farhad-abdi closed 1 year ago
8
PaLM-rlhf-pytorch Roadmap

#18 HappyPony closed 1 year ago
4
Help with computational power

#17 byteunix closed 1 year ago
4
Is it possible to release a code based on jax?

#16 sglucas closed 1 year ago
7
Simple Web Interface

#15 conceptofmind closed 1 year ago
2
Why the value calculate in generate and learn use different mask？

#14 Nightbringers closed 1 year ago
1
Palm

#13 Phob3tor closed 1 year ago
0
Can we just replace PPO+RLHF with a preference models thats basically a transformer encoder + sigmoid model, trained with BCE. And during finetuning perform a reward maximization by just making the reward model predict 1s?

#12 ssintelli closed 1 year ago
5
I'm dumb

#11 cardonasMind closed 1 year ago
1
Bug fix: Correct function call in RewardModel->finetune_parameters

#10 QasimWani closed 1 year ago
2
Can I train a model on my own data?

#9 sveisa closed 1 year ago
1
Noob question: How can I use this model for inference?

#8 PrasoonPratham closed 1 year ago
1