issues
search
l294265421
/
alpaca-rlhf
Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
https://88aeeb3aef5040507e.gradio.live/
MIT License
103
stars
13
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
增大max_prompt_len和max_ans_len训练会出现非法的内存访问问题
#16
Luoxiaohei41
opened
7 months ago
0
训练问题
#15
wanghao-007
opened
10 months ago
0
Step 3: Actor model和Reward model使用不同的tokenizer
#14
Kevin-myxu
opened
11 months ago
0
step2和step3中padding side似乎不一样?
#13
qiancheng99
opened
11 months ago
1
A question about setting tokens
#12
hepj987
opened
1 year ago
1
element 0 of tensors does not require grad and does not have a grad_fn
#11
Bill-Orz
opened
1 year ago
5
Fix pad_token_id bug
#10
Ablustrund
closed
1 year ago
2
关于Step3中是否需要把生成的answer中eos后面token mask掉
#9
Ablustrund
closed
1 year ago
1
deepspeed.initialize的一些疑惑
#8
iamsile
closed
1 year ago
8
how to run it, need more details
#7
SeekPoint
opened
1 year ago
2
v100 step3 oom
#6
iamsile
closed
1 year ago
12
stop at step2 evaluation_reward
#5
murphypei
opened
1 year ago
4
reward model在v100上训练时会卡住不动
#4
iamsile
closed
1 year ago
2
v100训练时显存oom
#3
iamsile
closed
1 year ago
2
Steps
#2
syngokhan
opened
1 year ago
1
训练效果怎么样
#1
Curious-chen
closed
1 year ago
3