issues
search
mrahtz
/
learning-from-human-preferences
Reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences"
MIT License
301
stars
67
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Using Reward Predictor
#16
eunjuyummy
opened
4 months ago
1
GRPC error
#15
errorer-max
opened
1 year ago
3
GRPC error
#14
errorer-max
closed
1 year ago
0
Bump pillow from 5.1.0 to 8.3.2
#12
dependabot[bot]
closed
2 years ago
1
Does not run on Windows
#11
jgocm
closed
2 years ago
2
Bump pillow from 5.1.0 to 8.2.0
#10
dependabot[bot]
closed
3 years ago
1
Bump urllib3 from 1.22 to 1.26.5
#9
dependabot[bot]
closed
3 years ago
1
Adjusting softmax function
#8
jakkarn
closed
3 years ago
4
Bump pillow from 5.1.0 to 8.1.1
#7
dependabot[bot]
closed
3 years ago
1
Bump psutil from 5.4.5 to 5.6.6
#6
dependabot[bot]
closed
2 years ago
1
Doubt on normalizing rewards
#5
SestoAle
closed
4 years ago
1
Extra instructions for Ubuntu
#4
eggsyntax
opened
5 years ago
18
The output is always waiting for preferences, 0 so far.
#3
ZhanPython
opened
5 years ago
3
Synthetic preferences - no preferences received
#2
JawwadF
closed
6 years ago
4
Create LICENSE
#1
mrahtz
closed
6 years ago
0