issues
search
icoxfog417
/
baby-steps-of-rl-ja
Pythonで学ぶ強化学習 -入門から実践まで- サンプルコード
Apache License 2.0
431
stars
262
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
DP/bellman_equation.py の実行結果が違う
#31
yujirokatagiri
closed
5 years ago
1
day4 数式について
#30
sirogamichandayo
closed
5 years ago
1
Fix script name in README
#29
tyfkda
closed
5 years ago
1
p.20 「報酬の総和」の式の誤植
#28
ryamauchi
closed
5 years ago
1
_
#27
haruhiko28
closed
5 years ago
1
Code1.2のEnvironmentがActionを持っている件について
#26
tanakataiki
closed
5 years ago
2
train_loopのobserve_intervalを0以外にすると、エラーが発生する
#25
icoxfog417
closed
5 years ago
1
baby-steps-of-rl-ja/DP/tests が 未定義の action_space を参照しているため失敗する
#24
yosukesan
closed
5 years ago
1
ValuteIterationPlannerをValueIterationPlannerに直しました
#23
funwarioisii
closed
5 years ago
1
Conda 4.6より、PowerShellでも仮想環境の有効化ができるようになったことを追記
#22
icoxfog417
closed
5 years ago
0
Fix Implementation of Policy Gradient & A2C (#15, #16)
#21
icoxfog417
closed
5 years ago
0
A2CのSampleLayerのノイズについて
#20
tatsuya-ogawa
closed
5 years ago
5
Fix not implemented exception (#11)
#19
icoxfog417
closed
5 years ago
0
Fix shape description (#14)
#18
icoxfog417
closed
5 years ago
0
Update to Edition 3
#17
icoxfog417
closed
5 years ago
0
A2Cの勾配計算について
#16
muupan
closed
5 years ago
3
4.4 Policy Gradientでのパラメータ更新について
#15
slaypni
closed
5 years ago
8
p132のコード shapeの勘違い
#14
akch-kk
closed
5 years ago
1
Policy Iteartionの方が常に速いわけではない
#13
icoxfog417
closed
5 years ago
0
TD-lambdaの式に誤りがある
#12
icoxfog417
closed
5 years ago
0
p107 コード
#11
mori97
closed
5 years ago
0
誤植 p.2
#10
akch-kk
closed
5 years ago
1
P28のコードの誤植
#9
yshr10ic
closed
5 years ago
1
Python2を使っていると小学生にけなされるのか?
#8
icoxfog417
opened
5 years ago
1
Fix issue #6: Calculation of Expected Reward of Policy Iteration
#7
icoxfog417
closed
5 years ago
0
p44のコード
#6
ExpIPiP1E0
closed
5 years ago
2
サンプルコードのインストール: Window => Windows
#5
icoxfog417
closed
5 years ago
0
p34 式の誤植
#4
takushi-m
closed
5 years ago
5
p28 code1-6: 記述してあるプログラムのミス
#3
ariacat3366
closed
5 years ago
1
「経験を状態評価、戦略どちらの更新に利用するか: Off policy vs On policy」について
#2
icoxfog417
closed
5 years ago
0
Value approximation=状態評価としているが、「価値評価」の方が適切
#1
icoxfog417
closed
5 years ago
0
Previous