p28 code1-6: 記述してあるプログラムのミス

icoxfog417 / baby-steps-of-rl-ja

Pythonで学ぶ強化学習 -入門から実践まで- サンプルコード

Apache License 2.0

431 stars 262 forks source link

指摘事項

p28のcode1-6、main関数内のwhile部分のインデントが適切では無い。

指摘箇所

[x] Day1: 強化学習の位置づけを知る
[ ] Day2: 強化学習の解法(1): 環境から計画を立てる
[ ] Day3: 強化学習の解法(2): 経験から計画を立てる
[ ] Day4: 強化学習に対するニューラルネットワークの適用
[ ] Day5: 強化学習の弱点
[ ] Day6: 強化学習の弱点を克服するための手法
[ ] Day7: 強化学習の活用領域

ページ番号: p28

実行環境

OS:
Python version:
pip freezeの実行結果 (下に添付)

エラー内容

(例外のメッセージ、ログ、画面ショットなどを添付)

def main(): # Make grid environment. grid = [ [0, 0, 0, 1], [0, 9, 0, -1], [0, 0, 0, 0] ] env = Environment(grid) agent = Agent(env) # Try 10 game. for i in range(10): # Initialize position of agent. state = env.reset() total_reward = 0 done = False while not done: action = agent.policy(state) next_state, reward, done = env.step(action) total_reward += reward state = next_state print("Episode {}: Agent gets {} reward.".format(i, total_reward))

icoxfog417 / baby-steps-of-rl-ja