mrahtz learning-from-human-preferences issues - Githubissues

mrahtz / learning-from-human-preferences

Reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences"

MIT License

301 stars 67 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Using Reward Predictor

#16 eunjuyummy opened 4 months ago
1
GRPC error

#15 errorer-max opened 1 year ago
3
GRPC error

#14 errorer-max closed 1 year ago
0
Bump pillow from 5.1.0 to 8.3.2

#12 dependabot[bot] closed 2 years ago
1
Does not run on Windows

#11 jgocm closed 2 years ago
2
Bump pillow from 5.1.0 to 8.2.0

#10 dependabot[bot] closed 3 years ago
1
Bump urllib3 from 1.22 to 1.26.5

#9 dependabot[bot] closed 3 years ago
1
Adjusting softmax function

#8 jakkarn closed 3 years ago
4
Bump pillow from 5.1.0 to 8.1.1

#7 dependabot[bot] closed 3 years ago
1
Bump psutil from 5.4.5 to 5.6.6

#6 dependabot[bot] closed 2 years ago
1
Doubt on normalizing rewards

#5 SestoAle closed 4 years ago
1
Extra instructions for Ubuntu

#4 eggsyntax opened 5 years ago
18
The output is always waiting for preferences, 0 so far.

#3 ZhanPython opened 5 years ago
3
Synthetic preferences - no preferences received

#2 JawwadF closed 6 years ago
4
Create LICENSE

#1 mrahtz closed 6 years ago
0