issues
search
vwxyzjn
/
lm-human-preference-details
RLHF implementation details of OAI's 2019 codebase
MIT License
145
stars
7
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Reward Shape
#28
QiyaoWei
closed
7 months ago
1
Summarization TL;DR
#27
vwxyzjn
opened
10 months ago
0
right_to_left_pad optimization
#26
vwxyzjn
closed
9 months ago
6
Question about KL divergence computation
#25
Maxtoq
closed
11 months ago
3
Various refactor
#24
vwxyzjn
closed
11 months ago
2
Pass `eps` to adam optimizer and correct minor typos
#23
liutianlin0121
closed
11 months ago
0
refactor
#22
vwxyzjn
closed
12 months ago
2
get benchmarkr results with TRL's pipeline
#21
vwxyzjn
opened
12 months ago
1
Jax reward learning: improve lr recording and use numpy_collate in dataloader
#20
liutianlin0121
closed
1 year ago
0
Deepspeed integration for 7B models
#19
vwxyzjn
closed
1 year ago
1
Jax policy learning
#18
liutianlin0121
closed
1 year ago
12
use pmap to normalize reward model
#17
liutianlin0121
closed
1 year ago
4
Docs improvement
#16
vwxyzjn
closed
1 year ago
0
2nd device (DO NOT MERGE)
#15
vwxyzjn
closed
1 year ago
0
Bug fix / refactor
#14
vwxyzjn
closed
1 year ago
1
Jax reward learning
#13
liutianlin0121
closed
1 year ago
5
add jax dependencies
#12
vwxyzjn
closed
1 year ago
1
Add accelerate to poetry
#11
liutianlin0121
closed
1 year ago
0
Use `untrained_model` for normalize
#10
vwxyzjn
closed
1 year ago
0
Add accelerate to poetry dependencies
#9
liutianlin0121
closed
1 year ago
2
A question about `normalize_after`
#8
liutianlin0121
closed
1 year ago
3
Name change: `left_padding_to_right_padding` --> `right_padding_to_left_padding`
#7
liutianlin0121
closed
1 year ago
1
Questions about `left_padding_to_right_padding`
#6
liutianlin0121
closed
1 year ago
4
Creating a jax implementation
#5
vwxyzjn
closed
11 months ago
0
Bump gitpython from 3.1.31 to 3.1.32
#4
dependabot[bot]
closed
1 year ago
1
Use tensorflow-style adam
#3
vwxyzjn
closed
1 year ago
0
Bump certifi from 2023.5.7 to 2023.7.22
#2
dependabot[bot]
closed
1 year ago
1
Bump aiohttp from 3.8.4 to 3.8.5
#1
dependabot[bot]
closed
1 year ago
1